Show HN: TeaTime – distributed book library powered by SQLite, IPFS and GitHub

(github.com)

307 points by yoavm217 days ago

21 comments

2color216 days ago
This looks neat!IPFS contributor here.I had a look at your code and saw how you handle downloading from multiple gateways.There's a better way to this which also leverages direct P2P retrieval in browsers, which is now a thing with IPFS! [0] If you just want to fetch, checkout @helia/verified-fetch[1] which gives you a Fetch like API that accepts CIDs. It handles all the content routing and retrieval p2p magic and can help reduce reliance on gateways.You can also pass it gateways in case it can connect to providers directly (due to transports)[0] <a href="https://blog.ipfs.tech/2024-shipyard-improving-ipfs-on-the-web/" rel="nofollow">https://blog.ipfs.tech/2024-shipyard-improving-ipfs-on-the-w...</a> [1] <a href="https://github.com/ipfs/helia-verified-fetch/tree/main/packages/verified-fetch">https://github.com/ipfs/helia-verified-fetch/tree/main/packa...</a>
- yoavm216 days ago
 Thanks! I really wanted to use verified-fetch, but it seemed like it doesn't really support all types of CIDs? In my case, most of the files use blake2b-256, and the last time I checked verified-fetch didn't like that :( Did something change?Edit: I now realised we previously talked on the Discord channel, thank you for the help there!
 - 2color216 days ago
 With some extra config you can pass custom hashers.Any reason you chose BLAKE2b-256?BLAKE2/3 are amazing hashing functions and have a lot of cool properties, but they really suck on the web in terms of performance. This came up literally yesterday <a href="https://bsky.app/profile/norman.life/post/3lbw2qltokc2i" rel="nofollow">https://bsky.app/profile/norman.life/post/3lbw2qltokc2i</a> and has been reported by multiple devs recently (who also shared excitement about BLAKE)
 - yoavm216 days ago
 The main reason I wanted to support it is that it happens to be what Library Genesis uses. I'll have a look into passing these custom headers! Would definitely be more elegant than hitting all gateways together and hoping for the best.
- 2color216 days ago
 Small correction:You can also pass [@helia/verified-fetch] gateways as a fallback for when it *cannot* connect to providers directly (due to transports).
aeyes217 days ago
The Pear P2P framework might be worth a look if you want to get off of GitHub and turn it into a truly distributed system. If the index must be on GitHub then what good is it to have the files on IPFS?<a href="https://docs.pears.com/" rel="nofollow">https://docs.pears.com/</a>
- yoavm216 days ago
 Thanks, I shall have a look into that. Technically the index too can live on IPFS, but the real challenge was how to know which databases exist, and where. This is what I'm using the GitHub topics for - everyone can create a repo and assign a topic to it, and the GitHub free API lets you search for matching repos. In a way, what I'm looking is to be able to make a request and get a different result when a new database is up, and the immutable nature of IPFS for example isn't great for that. GitHub stars and accounts also give some basic concept of content moderation. I should see what Pears offers in this space.Generally, I see the distributed nature of TeaTime as an instrument for being resilient, and less so a goal by itself. My thinking is that the GitHub database repositories are "resilient enough", especially if they can contain just a config.json file pointing to database files living elsewhere. But I'm not a lawyer and could be wrong.
 - boramalper216 days ago
 > Technically the index too can live on IPFSBesides, HTTP requests against indices should be fast enough for a decent user experience and IPFS (and/or its gateways) aren’t great at that in my experience. I think using GitHub (or any other static hosting providers) was a good call in that regard.
mdaniel217 days ago
> (I'm looking into using Helia so that users are also contributing nodes in the network)I had to look that term up <<a href="https://github.com/ipfs/helia#readme">https://github.com/ipfs/helia#readme</a>> but while sniffing around in their <<a href="https://github.com/ipfs/helia/wiki/Projects-using-Helia">https://github.com/ipfs/helia/wiki/Projects-using-Helia</a>> I was reminded of <a href="https://github.com/orbitdb/orbitdb#readme">https://github.com/orbitdb/orbitdb#readme</a> which seems like it may be much less rolling your own parts
- yoavm217 days ago
 Thanks, I haven't heard of it before. I wonder if it's possible to use OrbitDB to download a file from IPFS though? Based on it's CID, I mean, because that was my intention with Helia. I thought that one of the nice thing about reading books is that it takes time, and if users could be seeding their files from IndexedDB to the network, they could automatically and effortlessly become "contributing citizens".Another interesting use case would be to see if this can replace (or be an addition to) SQLite as the database in which the queries are ran.
 - zwnow216 days ago
 Seeding sounds like what's done by torrenting too no? How do you prevent people from distributing content they do not own?
 - yoavm216 days ago
 Yes it is, and you're absolutely right with raising this concern. I was planning to make seeding (if I get it working at all) an opt-in feature, firstly to prevent excessive data usage on mobile connections.
 - zwnow215 days ago
 I see, nice project nonetheless! Hope it gets where you want it :-)
boramalper216 days ago
Also see: Show HN: Fully-searchable Library Genesis on IPFS<a href="https://news.ycombinator.com/item?id=28585208">https://news.ycombinator.com/item?id=28585208</a><a href="https://libgen-crypto.ipns.dweb.link/" rel="nofollow">https://libgen-crypto.ipns.dweb.link/</a>
viborcip217 days ago
"Everything is done in the browser - no users, no cookies, no tracking. LocalStorage and IndexedDB are used for saving your last readings, your position in each file."I love this! Thanks for making this!
mushufasa217 days ago
> The databases used in TeaTime are GitHub repositories tagged with the teatime-database topic, which are published on GitHub Pages.Couldn't this be a security issue, for a bad actors to use this tag?
- yoavm217 days ago
 That's a fair point - I guess it could be abused. Databases are sorted by their number of GitHub stars, so I was hoping that with the power of the crowds it will be possible to minimize the bad effect such actor might have, by simply not voting them up.
 - mushufasa217 days ago
 there's been several attacks recently where a bad actor takes over a repo where the original maintainer wants to take a step back, then launch a supply chain attack. in recent cases, the attack came from obfuscated binary files in the repo rather than code. given we are dealing with documents here (books) that would be easy to hide malicious code in a file. pdfs have all sorts of execution vulnerabilities for example
 - yoavm217 days ago
 Interesting - I'm kinda counting on PDF.js, which is used for PDF rendering, on doing it safely, but of course that doesn't always have to be the case. Do you have any thoughts on how to make this safer?
 - mushufasa217 days ago
 some other method of collection where you can hav eknown trust of the files contributed, some method of 'registering' a submission to create trust,
 - cicko217 days ago
 PDFs are not executables.
 - samatman216 days ago
 Other than the JavaScript, you mean.<a href="https://cloudmersive.com/article/Understanding-Embedded-JavaScript-in-PDF-Documents" rel="nofollow">https://cloudmersive.com/article/Understanding-Embedded-Java...</a>
 - taneq217 days ago
 You’d be surprised what’s executable with the right attitude.
 - crest217 days ago
 May I recommend the old 27C3 talk "OMG WTF PDF"?
 - macintux217 days ago
 [dead]
garganzol216 days ago
It does not work for me. For example:> Access to fetch at '<a href="https://bafykbzacedvzdo4hru3wul5excjthwjuz5ggd6jcs4wg77tg7bkn3c7hlvohq.ipfs.dweb.link/" rel="nofollow">https://bafykbzacedvzdo4hru3wul5excjthwjuz5ggd6jcs4wg77tg7bk...</a>' (redirected from '<a href="https://w3s.link/ipfs/bafykbzacedvzdo4hru3wul5excjthwjuz5ggd6jcs4wg77tg7bkn3c7hlvohq" rel="nofollow">https://w3s.link/ipfs/bafykbzacedvzdo4hru3wul5excjthwjuz5ggd...</a>') from origin '<a href="https://bjesus.github.io" rel="nofollow">https://bjesus.github.io</a>' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.
- yoavm216 days ago
 The CORS headers seem fine, though I'm a little surprised about the content-type:<pre><code> $ curl -I https://bafykbzacedvzdo4hru3wul5excjthwjuz5ggd6jcs4wg77tg7bkn3c7hlvohq.ipfs.dweb.link/ HTTP/2 200 date: Thu, 28 Nov 2024 16:34:03 GMT content-type: application/x-rar-compressed content-length: 192215114 access-control-allow-headers: Content-Type access-control-allow-headers: Range access-control-allow-headers: User-Agent access-control-allow-headers: X-Requested-With access-control-allow-methods: GET access-control-allow-methods: HEAD access-control-allow-methods: OPTIONS access-control-allow-origin: * </code></pre> Can you try with another file? I do think that once I implement Helia's verified-fetch, issues like this should be less common.
slagfart217 days ago
Sorry if I missed this, but is there an example instance we can play with?
- yoavm217 days ago
 Yes, <a href="https://bjesus.github.io/teatime/" rel="nofollow">https://bjesus.github.io/teatime/</a>
sram1337217 days ago
I don't know what's in the database, so I don't know what to search for.How about a browse feature?
- yoavm216 days ago
 Ha, that's a great point... I can probably work on a meta "table of content" for each database, shouldn't be hard to do. But generally the database descriptions say what they have - public domain books, library genesis non-fiction, etc.
pizza217 days ago
This is awesome. Would it be able to integrate Pocket bookmarks into this somehow? Would love to be able to keep a cache offline of all the things I wanted to read (esp if I can prevent link-rot), and to be able to query them.
- yoavm216 days ago
  I need to check about Pocket - do you have any relevant documentation? Note however that TeaTime does keep an offline cache, both a list of your recent books and the files themselves, but it does it using LocalStorage and IndexedDB because I really wanted to prevent the need for having user logins. Maybe I can add an option to download a "state dump" kind of thing, so that you can a file with your history and can always import it. I too always feel that LocalStorage could be a little flaky, in the sense that you don't always realize that switching a browser would mean losing all your data.
evolve2k212 days ago
Hi I’m here looking for a different offering. I have a personal library of physical books that our small group wants to be able to lend out to others.I want people to be able to see which books are available and reserve books that are out and get notifications when due back.And preferable self host as a small site.Anyone know a small open source physical library management tool like this?
nyolfen216 days ago
beautiful! i made a much crappier version of the same kind of thing a year or two ago with a manually converted libgen db from annas-archive's index, but the best i could come up with was webtorrenting a sqlite db of hashes and importing it into indexeddb then distributing the static fontend bundle as an urbit app; it was much uglier than this
- yoavm216 days ago
 Thank you! I've seen quite a lot of similar projects as I was working on this, and they all served as much inspiration and also a proof that this can in fact be done.Also, thanks to whoever came up with thebookbay.org instance :)
clueless217 days ago
Is this like an open source distributed libgen?
- yoavm217 days ago
  Libgen is one of the databases that are currently available (and was probably a good fit because they already had their files on IPFS), but I think that this architecture, in which the UI is decoupled from the database, the database doesn't hold any copyrighted materials, and the files are downloaded from IPFS, is quite resilient and could be used for serving all sorts of content.
westurner217 days ago
Do static sites built with sphinx-build or jupyter-book or hugo or other jamstack static site generators work with TeaTime?sphinx-build: <a href="https://www.sphinx-doc.org/en/master/man/sphinx-build.html" rel="nofollow">https://www.sphinx-doc.org/en/master/man/sphinx-build.html</a>There may need to be a different Builder or an extension of sphinxcontrib.serializinghtml.JSONHTMLBuilder which serializes a doctree (basically a DOM document object model) to the output representation: <a href="https://www.sphinx-doc.org/en/master/usage/builders/#sphinxcontrib.serializinghtml.JSONHTMLBuilder" rel="nofollow">https://www.sphinx-doc.org/en/master/usage/builders/#sphinxc...</a>datasette and datasette-lite can load CSV, JSON, Parquet, and SQLite databases; supports Full-Text Search; and supports search Faceting. datasette-lite is a WASM build of datasette with the pyodide python distribution.datasette-lite > Loading SQLite databases: <a href="https://github.com/simonw/datasette-lite#loading-sqlite-databases">https://github.com/simonw/datasette-lite#loading-sqlite-data...</a>jupyter-lite is a WASM build of jupyter which also supports sqlite in notebooks in the browser with `import sqlite3` with the python kernel and also with a sqlite kernel: <a href="https://jupyter.org/try-jupyter/lab/" rel="nofollow">https://jupyter.org/try-jupyter/lab/</a>jupyterlite/xeus-sqlite-kernel: <a href="https://github.com/jupyterlite/xeus-sqlite-kernel">https://github.com/jupyterlite/xeus-sqlite-kernel</a>(edit)xeus-sqlite-kernel > "Loading SQLite databases from a remote URL" <a href="https://github.com/jupyterlite/xeus-sqlite-kernel/issues/6#issuecomment-1031751580">https://github.com/jupyterlite/xeus-sqlite-kernel/issues/6#i...</a>%FETCH <url> <filename> <a href="https://github.com/jupyter-xeus/xeus-sqlite/blob/ce5a598bdab29eb6aab78a78d0caee93c292ec74/src/xeus_sqlite_interpreter.cpp#L191">https://github.com/jupyter-xeus/xeus-sqlite/blob/ce5a598bdab...</a>xlite.cpp > void fetch(const std::string url, const std::string filename) <a href="https://github.com/jupyter-xeus/xeus-sqlite/blob/main/src/xlite.cpp#L38">https://github.com/jupyter-xeus/xeus-sqlite/blob/main/src/xl...</a>
- yoavm217 days ago
 > Do static sites built with sphinx-build or jupyter-book or hugo or other jamstack static site generators work with TeaTime?I guess it depends on what you mean by "work with TeaTime". TeaTime itself is a static site, generated using Nuxt. Nothing that it does cannot be achieved with another stack - at the end it's just HTML, CSS and JS. I haven't tried sphinx-build or jupyter-book, but there isn't a technical reason why Hugo wouldn't be able to build a TeaTime like website, using the same databases.> datasette-lite > Loading SQLite databases: <a href="https://github.com/simonw/datasette-lite#loading-sqlite-data">https://github.com/simonw/datasette-lite#loading-sqlite-data</a>...I haven't seen datasette before. What are the biggest benefits you think it has over sql.js-httpvfs (which I'm using now)? Is it about the ability to also use other formats, in addition to SQLite? I got the impression that sql.js-httpvfs was a bit more of a POC, and later some possibly better solutions came out, but I haven't really went that rabbit hole to figure out which one would be best.Edit: looking a little more into datasette-lite, it seems like one of the nice benefits of sql.js-httpvfs is that it doesn't download the whole SQLite database in order to query it. This makes it possible have a 2GB database but still read it in chunks, skipping around efficiently until you find your data.
 - westurner216 days ago
 So the IPFS part is instead of a CDN? Does it replace asset URLs in templates with e.g. cf IPFS gateway URLs with SRI hashes?datasette-lite > "Could this use the SQLite range header trick?" <a href="https://github.com/simonw/datasette-lite/issues/28">https://github.com/simonw/datasette-lite/issues/28</a>From xeus-sqlite-kernel > "Loading SQLite databases from a remote URL" <a href="https://github.com/jupyterlite/xeus-sqlite-kernel/issues/6#issuecomment-1031751580">https://github.com/jupyterlite/xeus-sqlite-kernel/issues/6#i...</a> re: "Loading partial SQLite databases over HTTP":> sql.js-httpvfs: <a href="https://github.com/phiresky/sql.js-httpvfs">https://github.com/phiresky/sql.js-httpvfs</a>> sqlite-wasm-http: <a href="https://github.com/mmomtchev/sqlite-wasm-http">https://github.com/mmomtchev/sqlite-wasm-http</a>>> This project is inspired from @phiresky/sql.js-httpvfs but uses the new official SQLite WASM distribution.Datasette creates a JSON API from a SQLite database, has an optional SQL query editor with canned queries, multi DB query support, docs; <a href="https://docs.datasette.io/en/stable/sql_queries.html#cross-database-queries" rel="nofollow">https://docs.datasette.io/en/stable/sql_queries.html#cross-d...</a> :> SQLite has the ability to run queries that join across multiple databases. Up to ten databases can be attached to a single SQLite connection and queried together.
 - yoavm216 days ago
 The IPFS part is where the actual files are downloaded from. I wouldn't necessarily say that it's instead of a CDN - it just seems to be a reliable enough way of getting the files (without the pain of setting up a server with all these files myself). Indeed, it fetches them through IPFS Gateways (the Cloudflare IPFS Gateway is unfortunately no longer a thing) using their CID hash.
lakomen216 days ago
I absolutely don't understand this.Doesn't ipfs require a server running? Where is the server code?
- 2color216 days ago
 It does require an IPFS node, i.e. server to provide tha data. The point is more that data isn't tied to any specific server. By content addressing with CIDs you can fetch from multiple IPFS nodes and even become a "provider" for a CID once your IPFS node downloads a copy.
 - lakomen213 days ago
 Yes, I'm aware how ipfs works, but that doesn't really answer the question :)
 - 2color211 days ago
 Sorry for misunderstanding.I think the app was built on the assumption that someone is providing these CIDs.
- yoavm216 days ago
 I'm currently using public IPFS Gateways to fetch the files from IPFS.
noamikotamir217 days ago
super cool
user_james92216 days ago
IPFS needs server running or am I wrong?
- yoavm216 days ago
  For fetching files from IPFS, you can use a public IPFS Gateway, and the Gateway is essentially your server. That's what TeaTime is doing.
rausr216 days ago
I'm slightly disappointed that this is not the Tea-Time that formed the comms/sync protocol used in the Croquet VR system (on Squeak Smalltalk) some years back. :)
evantbyrne217 days ago
Any particular reason for choosing IPFS instead of bittorrent or other p2p protocols? It feels like every time I try an IPFS tool it just crashes, whereas I rarely have issues with torrents.
- 2color216 days ago
 IPFS contributor here.IPFS certainly has its flaws, but it's been getting much better in the last year. If you want to do P2P retrieval in browsers, it's practically the most advanced from all protocols.We just published an update on this topic <a href="https://blog.ipfs.tech/2024-shipyard-improving-ipfs-on-the-web/" rel="nofollow">https://blog.ipfs.tech/2024-shipyard-improving-ipfs-on-the-w...</a>
 - boramalper216 days ago
 > If you want to do P2P retrieval in browsersMy understanding is that IPFS always requires an IPFS → HTTP gateway to work in browsers, so I wonder what you mean by "P2P retrival in browsers".Verified Fetch is about verifying content retrieved from gateways on client-side but the retrival is still very much client-server (vs P2P) today, and Service Worker Gateway seems to be relying on gateways too (<trustless-gateway.link> in the demo linked).Based on IPFS & HTTP: The Blueprint for Radical Interoperability - Lidel [0], I believe the situation is acknowledged by IPFS devs too. In my unsolicited opinion, using a custom protocol (libp2p transports?) for a project targetting web browsers was a mistake, so I'm glad to see that HTTP is being considered now.[0] <a href="https://www.youtube.com/watch?v=4GwxrQ9Z3Bk" rel="nofollow">https://www.youtube.com/watch?v=4GwxrQ9Z3Bk</a>
 - 2color216 days ago
 > My understanding is that IPFS always requires an IPFS → HTTP gateway to work in browsers, so I wonder what you mean by "P2P retrival in browsers".By P2P retrieval, I mean retrieval directly from a peer that has the data without additional hops.Historically HTTP gateways were the only way, because you couldn't dial (most) peers directly from a browser unless they had a CA-signed TLS certificate (needed in secure contexts.)A couple of things changed that:- WebTransport and WebRTC-direct allow browser-server(peer) communication without a CA signed certificate. WebTransport is very new and still has some problems in browser implementations.- We just launched AutoTLS which help public IPFS node get a wildcard let's encrypt TLS certificate, making it "dialable" from browsersWith these, it becomes a whole lot easier to establish connections from browsers to IPFS nodes. But it takes time for the network to upgrade.Note that Libp2p transports are wrappers over network transports to allow interoperability across runtimes, e.g. TCP, QUIC, WebSockets, WebTransport. These are not custom protocols.Now you point about custom protocols, by which I presume you are referring to data exchange/retrieval protocols.There are two IPFS data transfer protocols: - Bitswap: the first and default data transfer protocol which requires a streaming network transport (so HTTP isn't ideal). - HTTP Gateways: Initially, HTTP gateways were servers that would handle retrieval (over Bitswap) for you if they didn't have the data locally (sometimes we refer to these as Recursive Gateways, like trustless-gateway.link and ipfs.io). For all the reasons in Lidel's talk (caching, interop, composability), we extended this notion and made HTTP Gateway a general interface for data retrieval.Today, there are large providers in the network (pinning services) that expose their data with an HTTP Gateway, which allows browsers to retrieve from directly.We still have more work to do to expose the gateway in Kubo with the new AutoTLS certificate, but for the time being, Bitswap should work well enough even in browsers over a WebSockets connection. ---Verified Fetch's aims to ensure you get data is it's available. If it fails to find a direct provider of the data, it will fall back on a recursive gateway like trustless-gateway.link which is configured in the defaults. As more IPFS nodes upgrade to newer versions, reliance on such centralised recursive gateways will become unnecessary.TL;DR: We're all in on HTTP. The real constraint to P2P is browser vendors and slow standard bodies.
- yoavm217 days ago
 Yeah - the desire to have TeaTime run as a normal website. BitTorrent doesn't run over HTTP, unless you use WebTorrent, which most BitTorrent clients aren't 100% compatible with. This means you can basically only download from other WebTorrent nodes - and there aren't many.I do think this might change with the introduction of things like the Direct Sockets API, but for now they are too restricted and not widely supported.It's my first time working with IPFS and I agree it hasn't always been 100% reliable, but I do hope that if I manage to get TeaTime users to also be contributing nodes, this might actually improve the reliability of the whole network. Once it's possible to use BitTorrent in the browser, I do think it would be a great addition (<a href="https://github.com/bjesus/teatime/issues/3">https://github.com/bjesus/teatime/issues/3</a>).