SQLite Is a Library of Congress Recommended Storage Format

(sqlite.org)

532 points by whatisabcdefgh19 hours ago

27 comments

tnelsond411 hours ago
I'm always inspired by SQLite. Overall I like it, but if you're not doing writes it's really overkill.So I made a format that will never surpass SQLite, except that it's extremely lighter and faster and works on zstd compressed files. It has really small indexes and can contain binaries or text just like SQLite.The wasm part that decompresses and reads and searches the databases is only 38kb (uncompressed (maybe 16kb gzipped)). Compare that to SQLite's 1.2mb of wasm and glue code it's 3% the size but searching and loading is much faster. My program isn't really column based and isn't suitable for managing spreadsheets, but it's great for dictionaries and file archives of images and audio.I ported the jbig2 decoder as a 17kb wasm module, so I can load monochrome scans that are 8kb per page and still legible.<a href="https://github.com/tnelsond/peakslab" rel="nofollow">https://github.com/tnelsond/peakslab</a>SQLite is very well engineered, PeakSlab is very simple.
- sgbeal6 hours ago
 > Compare that to SQLite's 1.2mb of wasm and glue codeThe current trunk is actually 1.7mb in its canonical unminified form (which includes very nearly as much docs as JS code), split almost evenly between the WASM and JS pieces :/. Edit: it is 1.2mb in minified form, though.Disclosure: i'm its maintainer.Edit: current trunk, for the sake of trivia:<pre><code> sqlite3.wasm 896745 sqlite3.mjs 816270 # unminified w/ docs sqlite3.mjs 431388 # unminified w/o docs sqlite3.mjs 310975 # minified</code></pre>
- smartmic6 hours ago
 Many comments here to your creation, PeakSlab, but not yet a dedicated praise. I didn't know it but I have to say it is really cool and innovative! The performance of the dictionary is indeed superb and I will definitely bookmark this for future reuse. So, in a nutshell: thanks for sharing!
- pjc508 hours ago
 I think actually this competes with the old BerkeleyDB: <a href="https://en.wikipedia.org/wiki/Berkeley_DB" rel="nofollow">https://en.wikipedia.org/wiki/Berkeley_DB</a> - which I now see is no longer BSD-licensed, and in any case has been rendered almost extinct by SQLite. It was used for basic on-disk key-value store work.
 - nostrademons53 minutes ago
 It seems more like SSTables, which are widely used by open-source software like LevelDB, HBase, and Cassandra (and Google's BigTable) but AFAIK don't have a standard open-source reader (unless you want to pull the relevant source file out of Cassandra or LevelDB).<a href="https://www.igvita.com/2012/02/06/sstable-and-log-structured-storage-leveldb/" rel="nofollow">https://www.igvita.com/2012/02/06/sstable-and-log-structured...</a>
 - tnelsond47 hours ago
 Even BerkeleyDB tries to be mutable. What I'm doing doesn't need the mutability so it's much more similar to dictionary formats (though probably simpler) than it is to a database. Though a lot of people do use full databases for immutable dictionary key-value stuff. I just couldn't get any database to work well enough for a pwa dictionary.
 - tingletech1 hour ago
 I don't think it has had a BSD license this century, Sleepy Cat was selling licenses in the 90s before Oracle bought them.
- raxxorraxor5 hours ago
 SQLite is simple in its own way and I like the design principle of their SQL dialect."Right joins are just left joins in the wrong direction, you don't need that crap"Of course it always gets simpler or more specialised. I think many apps using databases would run with SQLite just as well. And some would probably run just as well with a textfile instead of any db like SQLite.
 - chungy4 hours ago
 > "Right joins are just left joins in the wrong direction, you don't need that crap"SQLite has supported all types of joins since version 3.39 in 2022.
 - tnelsond43 hours ago
 I must've messed something up, but I remember some joins (was it full outer join?) being unbelievably slow? Was I doing something wrong?
 - chungy1 hour ago
 Too vague of a question to give you an answer you'll likely sound satisfactory :)You probably just needed to create indexes over your data to speed things up.
 - raxxorraxor3 hours ago
 Well, look at that, now it is downhill from here!
 - luckystarr5 hours ago
 For the love of god, don't do blank textiles anymore. In the end you have a software that has 20 (or more) individual files for each programs section, which works fine until you want the files to be consistent. Boom. And then you add a lock to fix it and suddenly your whole program can only run sequentially. And then your customers ask why it's so slow in ingress. I won't name any names here, but this is a real commercial product.
 - BenjiWiebe5 hours ago
 We use a cheap invoicing program. It works fine except it gets very slow when dealing with large numbers if invoices. Turns out each invoice (or payment record, or customer record, or whatever) is a separate text file with form-urlencoded data. No indices.
- gpvos6 hours ago
 A more standard solution would be cdb.[0] Although that doesn't support compressed data.[0] <a href="https://cdb.cr.yp.to/" rel="nofollow">https://cdb.cr.yp.to/</a> , <a href="https://en.wikipedia.org/wiki/Cdb_(software)" rel="nofollow">https://en.wikipedia.org/wiki/Cdb_(software)</a>
 - tnelsond43 hours ago
 I'll definitely have to look at this. I'm using binary search right now because it's fast enough and I understand it better, but I might replace it with hashes at some point.
- tmountain4 hours ago
 Overkill in what way exactly? The LOC of the project shouldn't have any bearing on most people's usage of the project. SQLite is one of the well tested and mature projects in the world. What exactly would motivate someone to use PeakSlab instead? What problem are you solving?
 - tnelsond44 hours ago
 I'm solving a simpler problem. Just making cross platform dictionary progressive web apps with indexes and full text search and HTML tags and uppercase letters inserted back into the text on render so they don't interfere with search.SQLite is 1.2mb in combined wasm and JavaScript and not really designed for my use case, so I would have to add all the things i need anyway like compression and HTML tag insertion. For my use case which is just for pwas SQLite takes too long to load and the files are too big and the search isn't tailored. So I made something else in 38kb instead
 - tmountain1 hour ago
 Got it, thanks.
 - IshKebab4 hours ago
 Read the comment. He's using it in WASM form and doesn't want users to have to download 1.2MB of SQLite every time they visit the page.
 - shermantanktop3 hours ago
 Client caches are a thing, so this is most relevant for cold-start customers. In that case PeakSlab’s download size is an advantage.Fwiw LocalStorage is a SQLite db on most browsers, with a kv api. It’s be interesting to have the actual API available.
 - tnelsond42 hours ago
 Even on warm start PeakSlab is twice as fast. It's not just download size, it's execution speed, zero copy, database decompression, etc.That's why PeakSlab is written in c, because what's faster than casting the whole database to a struct? ;-P
- TheRealPomax1 hour ago
 If you're not modifying data, whatever system is using the data doesn't need a database at all, it just needs a data export.
- giza18210 hours ago
 Perhaps a dumb question, but how do you get data into it if you’re not doing writes
 - andrelaszlo8 hours ago
 I think it's just immutable once you've generated it. No need to update indexes or check consistency on writes, no need for transactions, etc.
 - electroly3 hours ago
 I have a system that builds SQLite databases and uploads them to S3. Once they're in S3, they are never changed. The program that builds the databases only does writes, and the program that queries the databases only does reads. It uses a VFS to query the database in-place with HTTP range requests.This is indeed not an optimal setup. A more careful design from first principles would not require seeking around the file as much as SQLite does, we'd do a better job on reading exactly the correct range of bytes for a given query since we know ahead of time what the access patterns are, and we could do reads in parallel. With SQLite we have to be very careful about the schema design to ensure it won't have to seek too many times to answer a query. But SQLite was expedient, and I'm confident I'll always be able to read the files. That's less certain for a custom file format.
 - giovannibonetti1 hour ago
 If it's going to be read-only, why not make it a Parquet file instead? It should result in a smaller file size due to columnar compression.DuckDB has built-in capability to read Parquet files with HTTP range requests.
 - electroly10 minutes ago
 For this use case we need the ability to do an indexed query and extract a small number of rows from a large database. It's a traditional row-oriented database workload. I'm sure other solutions would also work, but SQLite's design melds well with the data. The migration from partitioned SQL Server tables to a collection of SQLite database files was straightforward.
 - tnelsond410 hours ago
 Generate it one time from a source tsv file or folder of media.
 - pfortuny7 hours ago
 Think historical records of, say, share values for past years. You might have a single db for 1900-2000, for instance. Things like that.Not everything needs to be real-time updated.
 - shermantanktop3 hours ago
 It’s an RODB. Ship the preindexed data blob.
- meindnoch7 hours ago
 It is crashing Safari.
 - tnelsond44 hours ago
 Works on my wife's old iphone. I don't have a mac to test things on.
- zoky11 hours ago
 something something XKCD competing standards something something
 - lpln34528 hours ago
 Creating something new for a different use case isn't pointless. It's like comparing inline skates to ice skates.
 - tnelsond410 hours ago
 Believe me, I tried sticking to SQLite or aard2 or stardict, they just were fundamentally inadequate with no good pwa cross platform tooling.
 - bbkane9 hours ago
 Does this remain true now that SQLite has a WASM build?
 - tnelsond49 hours ago
 Yes, because originally when I started PeakSlab it used the SQLite wasm build.
 - keybored7 hours ago
 Doesn’t even apply unless someone says that (1) there are too many “standards”, and (2) so we are making this standard (neither apply here). Someone made something.We should really consider eventually retiring memes because they just end up as thought-terminating cliches.This is of course referring to xkcd #927. How do I know that?
alexpotato16 hours ago
I have always loved SQLite.I have also heard that some firms ban its use.Why?Because it makes it SO easy to set up a database for your app that you end up with a super critical component of your application that looks exactly like a file. A file that can have any extension. And that file can be copied around to other servers. Even if there is PII in that file. Multiply this times the number of applications in your firm and you can see how this could get a little nuts.DevOps and DBA teams would prefer that the database be a big, heavy iron thing that is very obviously a database server. And when you connect to it, that's also very obvious etc etc.I still love SQLite though.
- Fwirt15 hours ago
 The question is, do the same firms ban Excel? Excel spreadsheets often end up as shadow databases in unlikely places.
 - croon8 hours ago
 This might catch flak, but generalizing I would assume that the people banning things are the same people who would use excel for something where a database would be better, and if so, that is the reason Excel isn't banned on the same conditionals that would get sqlite banned.
 - hermitShell15 hours ago
 The sane thing would be to ban Excel and promote SQLite. Excel is often used for tabulated text (issue tracking) not calculations. Perfect use case for a relational db
 - rswail10 hours ago
 Excel has sheets for tables, columns and rows, primary keys (UNIQUE), foreign key references etc if you squint.It doesn't require you use all of that properly, but it's there.
 - frollogaston15 hours ago
 Excel is made for calculations. But if you make it hard to make a DB, people will abuse Excel as a DB.
 - TJSomething12 hours ago
 I mean, it might have been at first, but Microsoft figured out that the majority of users for lists without formulas in 1993 and they've strategized around that. IMHO, the biggest concession to this was when they added Power Query to core Excel in 2016.
 - harvie8 hours ago
 or reimplement excel with sqlite as a backend :-DBTW sqlite can run SQL queries on CSV files with relatively simple one-liner command...
 - euroderf3 hours ago
 Well heck can't someone make an SQLite extension that is basically just a simplified Excel ?
 - 0123456789ABCDE10 hours ago
 and excel has gui for forms
 - rantingdemon9 hours ago
 Only where VBA is available. Not available for MacOs versions if I'm correct?
 nereye4 hours ago
 VBA is just there for backward compatibility.The modern alternative is to use JavaScript/TypeScript, which makes such solutions cross platform (including MacOs, web etc.):<a href="https://learn.microsoft.com/en-us/office/dev/add-ins/overview/office-add-ins" rel="nofollow">https://learn.microsoft.com/en-us/office/dev/add-ins/overvie...</a>
 - mcdonje3 hours ago
 PII sniffers are pretty good at dealing with excel files. Excel is seen more as an analyst tool than a dev tool. Any place that bans Excel needs to either let analysts use some other turing complete data tools, like python or R or something, or they'll have trouble attracting analyst talent. They'll have devs and data entry users and that's it.The only way that works is if the dev team is large enough to be responsive to business needs, which almost never happens because devs are expensive. The juniors who are tweaking business logic every day are functionally doing a role analysts can do if you just give them a sane API and data tools.
 - haspok4 hours ago
 You can enforce classification and privacy labels (or something similar) in Excel and other document files, at least in a closed corporate environment. Azure also supports this. Also, everyone has Office installed (in a corporate environment), anyone can open and work with an Excel file.
 - jnwatson3 hours ago
 I don't have Office installed, nor do a significant majority of my peers. Given that sqlite is installed by default on Macs, a sqlite file is far more portable than an Excel file.
 - silon4212 hours ago
 IMO, almost any Excel more than a month old should become readonly.
 - irishcoffee7 hours ago
 You should consider knock-on effects of this brilliant idea. Now there would be copies of spreadsheets younger than a month that get replicated 47 billion times, exponentially compounding the problem you're trying to solve.This sounds like how we pass so many stupid laws. Nobody thinks about 2nd order effects.
 - perching_aix6 hours ago
 So you're saying they should further auto-delete after two or three months?
 irishcoffee6 hours ago
 3rd order effect, people copy and paste the old sheet into a new sheet, now we have worse exponential. You’re not very good at this huh.
 perching_aix6 hours ago
 Which is very annoying and people will complain. People complaining can be then directed towards a better solution. As a bonus, mistakes will also rise, leading to further complaints, especially ones that reach higher. All this making the dogshit practice, and the idiots committing them, infinitely more visible and thus fixable.The sheer volume of data that needs tending to may even grind certain departments to a halt! What a great opportunity! It'd appear I'm positively stellar at this!
 irishcoffee3 hours ago
 Sorry for the snark, that was shitty of me.
 - mr_toad6 hours ago
 I’ve worked at some organisations that have strict rules (not always strictly followed) about what can go in Excel spreadsheets, and where they have to be stored. The C drive is verboten. Some also have standards about classification and labelling of PII and sensitive data.
 - Spooky2314 hours ago
 They generally cannot. But they do banish Access.
 - pasc187811 hours ago
 Now that is different.Access gets used for a shared DB and that is quite easy to corrupt. It is much more cost effective to have that in a proper central database (I supse SQLLite is better here as well)
 - cwillu8 hours ago
 Excel is also a shared DB: it has supported multiple concurrent users accessing and modifying the same spreadsheet for decades.
 - forinti3 hours ago
 Don't get me started on Access...
 - mcdonje2 hours ago
 Man, Access could've been so good if they just made an app around SQLite. Or since it's Microsoft and they need to do everything their own way, it would've been so good if they made a flat file DB à la SQLite, but with T-SQL (or a subset thereof) instead of JET-SQL.Increase interoperability. Funnel data people from Excel into real DB technologies.And if they did more to blur the lines between spreadsheets and databases, and make it seamless to work out of both Excel and Access, add more spreadsheet features to the data views, etc.
 - DeathArrow13 hours ago
 Do companies ban text files? Text files are used to store data.
 - perching_aix6 hours ago
 That's why you store them on unsaved tabs instead.
 - yard20109 hours ago
 Do companies ban data centers? It's crazy to send PII to other computers on the line.
 - altmanaltman9 hours ago
 Do companies ban brains? Brains are used to store data.
 - shermantanktop3 hours ago
 Brains do not exhibit ACID properties…
- tehlike13 hours ago
 There are interesting uses for sqlite, like this one: <a href="https://sqlite.org/sqlar.html" rel="nofollow">https://sqlite.org/sqlar.html</a>
- y-curious2 hours ago
 Required reading for “anything can become a mission critical database” conversations:<a href="https://www.reddit.com/r/sysadmin/comments/eaphr8/a_dropbox_account_gave_me_stomach_ulcers/" rel="nofollow">https://www.reddit.com/r/sysadmin/comments/eaphr8/a_dropbox_...</a>
- WorldMaker2 hours ago
 This "shadow IT DBA" issue has always been a classic problem with Access databases, too.
- giancarlostoro4 hours ago
 This is why I put configs like that into AppData or dotfile directories, or the equivalent for MacOS (I forget which one it is inside of the ~/Library directory).
- Scribbd5 hours ago
 I recently watched a YT video about this subject: <a href="https://www.youtube.com/watch?v=lSVgeMoXJTs" rel="nofollow">https://www.youtube.com/watch?v=lSVgeMoXJTs</a>In summary, companies use the bus-metric to see how viable a project is. Bus, as in, how many people can be hit by a bus before there is no one left to maintain the project.Despite its ubiquity, SQLite is maintained by only 3 people. That bus-metric for SQLite is 3, which is way too low for some companies.Give the link a watch; it was really interesting.
 - BenjiWiebe5 hours ago
 At least with SQLite, it is really stable so if development did cease, you'd probably be fine indefinitely.
- TheRealPomax1 hour ago
 Some firms don't understand how to do data management, and if we draw the venn diagram of those and the ones that ban sqlite, it'd be pretty close to a circle.Yes, databases could have any extension. No sane dev team would accept code that doesn't use an object extension for a sqlite database.Yes, databases can contain PII but no sane product manager would go "yes, that's a good use of sqlite".Yes, you can trivially copy database files, but no sane product needs to in the same way that no sane product should require folks to just clone the db just to do some work.Pretty much every reason a company has for banning sqlite is a red flag for working there.
- duped2 hours ago
 > a file that can have any extensionSo read the magic number, you shouldn't trust file extensions anyway> that file can be copied around to other serversSo can spreadsheetsI'm not discounting that having centralized data access is desirable but it doesn't sound like that particular reasoning is well thought out
- gandutraveler7 hours ago
 DevOPs and DBAs must hate RAM and caches. We
- ai_slop_hater15 hours ago
 That's so dumb
- slopinthebag15 hours ago
 > DevOps and DBA teamsAh so two teams nobody should listen to.
 - frollogaston15 hours ago
 At least would take it with a grain of salt when the DBA wants you to depend more on the DBA.
 - slopinthebag14 hours ago
 Same with devops tbh."Hey everyone, we need to chose the option that involves us the most and provides us the most job security"
 - mschuster919 hours ago
 Well... eventually the company learns the lesson the hard way, either because a site goes down or gets 0wned. Then everyone will cry about "how this could happen", and the ops people will tell you in response "we warned you that this would happen, here's the receipts, now GTFO".
 shermantanktop3 hours ago
 Preparing to say “I told you so” is a fairly obvious incentive for someone to act like Chicken Little. And of course sometimes they are right, but not always.Lots of great people in both devops and security. But when teams position themselves as the conscience of the org and the gatekeepers of production, the defensive victim mentality can get pretty strong.
faangguyindia12 hours ago
I went from thinking “SQLite is a toy product, not reliable for real data" to "lets use SQLite for almost everything"SQLite is very good if you can fit into the single writer, multiple readers pattern; you'll never lose data if you use the correct settings, which takes a minute of Google search to figure out.Today, most of my apps are simply go binary + SQLite + systemd service file.I've yet to lose data. Performance is great and plenty for most apps
- michaelchisari11 hours ago
 The single writer is less of an issue in practice than it's made out to be. Modern nvme drives are incredible and it's trivial to get 5k writes per second in an optimized WAL setup. Way more than most apps could ever dream.And even then, I've used a batch writer pattern to get 180k writes per second on a commodity vps.
 - 0123456789ABCDE9 hours ago
 all* of that + sharding -> <a href="https://sqlite.org/lang_attach.html" rel="nofollow">https://sqlite.org/lang_attach.html</a>ex: main.db + fts.db. reading and writing to main.db is always available; updating the fts index can be done without blocking the main database — it only needs to read, the reads can be chunked, and delayed. fts.db keeps the index + a cursor table — an id or last change tscould also use a shard to handle tables for metrics, or simply move old data out of main.db* some examples:<pre><code> conn = sqlite3.connect("data.db") conn.execute("PRAGMA journal_mode=WAL") # concurrent reads (see above) conn.execute("PRAGMA synchronous=NORMAL") # fsync at checkpoint, not every commit conn.execute("PRAGMA cache_size=-62500") # ~61 MB page cache (negative = KB) conn.execute("PRAGMA temp_store=MEMORY") # temp tables and indexes in RAM conn.execute("PRAGMA busy_timeout=5000") # wait 5s on lock instead of failing </code></pre> edit: orms will obliterate your performance — use raw queries instead. just make sure to run static analysis on your code base to catch sqli bugs.my replies are being ratelimited, so let me add thisthe heavy duty server other databases have is doing that load bearing work that folks tend to complain about sqlite can't dothe real dmbs's are doing mostly the same work that sqlite does, you just don't have to think about it once they're set up. behind that chunky server process the database is still dealing with writing your data to a filesystem, handling transaction locks, etc.by default sqlite gives you a stable database file, that when you see the transaction complete, it means the changes have been committed to storage, and cannot be lost if the machine were to crash exactly after that.you can decide to wave some, or all of those guaranties in exchange for performance, and this doesn't even have to be an all or nothing situation.
 - hparadiz9 hours ago
 Oh fun something I have some metrics on. I just made this benchmark for every php orm a few weeks ago for fun.<a href="https://the-php-bench.technex.us/" rel="nofollow">https://the-php-bench.technex.us/</a>There's a huge performance difference between memory and file storage within sqlite itself. Not even getting into tuning specifics.
 - Ringz9 hours ago
 I usually try to explain it like this: “Single writer” is rarely a real problem, because a writer is not slow. It writes exclusively, but very quickly."Batch writer pattern" is a good idea to get rid of expensive commits.
- beoberha2 hours ago
 For me, the concern about SQLite has never been if the database engine itself is “reliable for real data”, but that storing data on a single node is not “reliable for real data”. Performance aside, what you are positing is no different than dumping everything to a text file on disk. What happens if that VM dies?
 - aperrien1 hour ago
 If the file is that important, it shouldn't be stored in the VM, but on some sort of more robust storage system.
- ashellunts3 hours ago
 Do you use multiple backend nodes? If yes, how do you access sqlite files from different nodes?
 - faangguyindia2 hours ago
 I use it for apps which don't need multiple backend nodes.When i actually have something that requires multi nodes, i just use postgres (with replica) or mongo (with replica).But it's for those apps which are in autoscaler.For bulk data refresh I use build artifact and hotreload memort mapped files, by checking a manifest on object storage then only getting update if newer.I've used this pattern everywhere and never really needed anything more, occasionally i might use redis if something required shared state across multiple nodes and fast.
srcreigh15 hours ago
2026 recommended storage formats: <a href="https://www.loc.gov/preservation/resources/rfs/data.html" rel="nofollow">https://www.loc.gov/preservation/resources/rfs/data.html</a>
- nashashmi6 hours ago
 Taking a minute to appreciate the level of long term thinking required for storing data, to plan for 300-500 years into the future, to be able to withstand all kinds of innovations, and survive basic obsolescence.What is the longest surviving paper medium?
 - StilesCrisis4 hours ago
 Dead sea scrolls come to mind, for some values of "survive." And the Book of Kells is in good shape.
- IshKebab4 hours ago
 Seems like they're pretty lax about their recommendations tbh. XLS is "preferred".
rmunn14 hours ago
> As of this writing (2018-05-29) ...So this news is nearly <del>six</del> EIGHT years old. But I didn't happen to know about it until now, so that's not a complaint at all; rather, this is a thank-you for posting it.(Thanks for the correction. Brief brain malfunction in the math department there).
- tehlike14 hours ago
 Sir, it's 2026. It's 8 years old.
 - harrouet8 hours ago
 Not if the GP was written 2 years ago :)
 - rmunn13 hours ago
 Corrected; thanks.
- frollogaston12 hours ago
 Was going to say, was having deja vu reading this
akihitot15 hours ago
For public-sector data preservation, it may be one of the best options.The specification is publicly available- It is widely adopted - It is likely to remain readable in the future - It has little dependency on specific operating systems or services - It carries low patent riskFrom the perspective of long-term continuity, avoiding dependence on any particular company or service is extremely important.
- Spooky2314 hours ago
 Archivists also love formats close to native. SQLite lets the relational relationships be present in a way that csv cannot.
 - b40d-48b2-979e4 hours ago
 Foreign keys are not enforced unless you enable it but only for that connection.
 - akihitot14 hours ago
 That's certainly true. The ability to define table relationships is a major difference from CSV.
afshinmeh12 hours ago
I love SQLite and thanks for sharing it but there should be a "(2018)" at the end in the title:> As of this writing (2018-05-29) the only other recommended storage formats for datasets are XML, JSON, and CSV.
- maxloh11 hours ago
 FYI, they added a lot more formats to the list after that.<pre><code> Preferred 1. Platform-independent, character-based formats are preferred over native or binary formats as long as data is complete, and retains full detail and precision. Preferred formats include well-developed, widely adopted, de facto marketplace standards, e.g. a. Formats using well known schemas with public validation tool available b. Line-oriented, e.g. TSV, CSV, fixed-width c. Platform-independent open formats, e.g. .db, .db3, .sqlite, .sqlite3 2. Any proprietary format that is a de facto standard for a profession or supported by multiple tools (e.g. Excel .xls or .xlsx, Shapefile) 3. Character Encoding, in descending order of preference: a. UTF-8, UTF-16 (with BOM), b. US-ASCII or ISO 8859-1 c. Other named encoding --- Acceptable For data (in order of preference): 1. Non-proprietary, publicly documented formats endorsed as standards by a professional community or government agency, e.g. CDF, HDF 2. Text-based data formats with available schema For aggregation or transfer: 1. ZIP, RAR, tar, 7z with no encryption, password or other protection mechanisms. </code></pre> <a href="https://www.loc.gov/preservation/resources/rfs/data.html" rel="nofollow">https://www.loc.gov/preservation/resources/rfs/data.html</a>
 - xxs10 hours ago
 .7z being there just discredits the entire process. The underlying compression algorithm is a free-hand one and can be anything[0], or contain bugs and exploits[1]. Personally I use only zstd with .7z which is 'non-standard' by the official (Russian) release.[0]: <a href="https://7-zip.org/7z.html" rel="nofollow">https://7-zip.org/7z.html</a>[1]: CVE-2025-0411
 - tnelsond49 hours ago
 I love using zstd, it's so fast to decompress. I especially like that the JavaScript decoder is 8kb and still really fast. Though the 25kb wasm decoders are about twice as fast.What are the advantages or reasons to use zstd in a 7z container versus just .zst?
 - xxs6 hours ago
 I love zstd as the next guy and I do use zstd solo for the most part. I had a talk on it few years back too (incl. using the lib directly from Java, massively decreasing log storage, and so on).Why use it w/ 7-zip though. 7-zip archives multiple files/directories and supports encryption. It has the UI too.. On Windows there is NanaZip that's available in the microsoft store which has been graced by corporate for user-install (unlike zstd that effectively needs WSL), and most folks won't be able to use the command line tool.Of course using tar with zstd is always an option if you are on linux.
imrozim1 hour ago
I use postgresql for my startup but every time i needed a quick local testing i wish it was as simple as sqlite. No config just works.
lenwood3 hours ago
Just yesterday it occurred to me that it had been a while since I last saw an SQLite post at the top of HN.I really like the simplicity and speed of SQLite, I've used in both personal and professional projects. For day-to-day work I still end up in Excel, not because I like it more (I don't), but because its ubiquity makes it the lowest friction way to share & explore datasets with less technical stakeholders and execs.
- gcardinal3 hours ago
 I'm under no illusion I'll suddenly shatter your world views with this, but in case it's as useful to you as it was to me, you might want to check out Metabase[1].You can self-host and if all you care about is showing data in a digestible format to stakeholders, it's really simple. You can of course go overboard and regret all of your life's decisions with it, but I try and abstain myself.[1]: <a href="https://www.metabase.com/" rel="nofollow">https://www.metabase.com/</a>
- LoganDark3 hours ago
 I've always been irked by how SQLite relies on text parsing to work. Why is it that I have to write queries in text rather than expressing them in programmatic logic? I have never used a relational database because of this, because I hate them, but they can be more performant than pure structured data, but I hate SQL and the entire idea of SQL and I don't want to write it or learn it or use a system that relies on it. It feels like the wrong approach, on the level of PHP. Is there anything I can do to help this? I don't want to keep passing up SQLite just because of SQL, but I just can't seem to agree with it. I don't want to build strings or have string parsing anywhere in the stack, it just feels wrong.
 - DowsingSpoon32 minutes ago
 If you want to avoid string manipulation then you can construct queries with a query builder API like C#'s LINQ. Other languages have similar libraries, e.g., Rust has Diesel.If your objection is to the SQL language itself then you might find Datalog interesting. Datalog is a logic-based language where you query by writing predicates rather than writing SQL statements. Check out Logica <<a href="https://logica.dev" rel="nofollow">https://logica.dev</a>>. It's a language in the Datalog family that compiles to SQL.In both cases, SQL is used only as a low-level IR for interfacing with the database engine.
 - StilesCrisis2 hours ago
 A "prepared statement" is a precompiled SQL command, ready for bindings and execution: <a href="https://sqlite.org/c3ref/stmt.html" rel="nofollow">https://sqlite.org/c3ref/stmt.html</a>You can't precompile your SQL at build time, unfortunately, but you _can_ precompile all your SQL at the very start of your program and then never touch the parser again. This might be a good middle ground for you. It is infra that you can centralize, write some unit tests against, and then not worry about forever.It's not common because the SQLite parser is lightning fast and it's so convenient to just write out a new query as you need one, versus having one bucket of all queries. But it's an option!
 - weird-eye-issue2 hours ago
 I bet you really love LLMs
 - LoganDark2 hours ago
 I'm torn on LLMs. I've started to use them to accelerate personal development, but they still require a lot of babying and manual assistance. Still, they help a lot.
tombert13 hours ago
On a recent project I have needed to use exFAT. exFAT is terrible for a number of reasons, but in my case the thing I had to deal with was the lack of journaling, which had the possibility to corrupt files if there were a power interruption or something.I initially was writing a series of files and doing some quasi-append-only things with new files and compacting the old one to sort of reinvent journaling. What I did more or less worked but it was very ad hoc and bad and was probably hiding a lot of bugs I would eventually have to fix later.And then I remembered SQLite. I realized that ACID was probably safe enough for my needs, and then all the hard parts I was reinventing were probably faster and less likely to break if I used something thoroughly audited and tested, so I reworked everything I was doing to SQLite and it worked fine.I wish exFAT would die in a fire and a journaling filesystem would replace it as the "one filesystem you can use everywhere", but until it does I'm grateful SQLite exists.
- topham12 hours ago
 The problem with it is you didn't solve your biggest actual problem, you just haven't had a problem bite you in the ass yet so you think your problem is solved.
 - tombert10 hours ago
 I am not sure the problem is actually fully solvable. I think SQLite helps at least a little.
 - IshKebab3 hours ago
 It's totally solvable and SQLite solves it (or claims to anyway). The real question is if it works. To test this sort of thing properly you really need what is now called DST and I'm not sure SQLite does that. It is pretty well tested though so they've probably done at least some testing of it.
- mmooss12 hours ago
 > I wish exFAT would die in a fire and a journaling filesystem would replace it as the "one filesystem you can use everywhere"Where exactly is everywhere? Win32? All of Linux? BSDs? MacOS? IOS? ...
 - noirscape4 hours ago
 Everywhere in the sense of "I have a USB stick/SD card, what do I format it to so that every major device I'm using can read it".In practice, every OS has its preferred system and the rest has varying levels of "I guess this works", with FAT32 and exFAT being the only real cross-platform options.To wit:* NTFS is only really properly and fully supported on Windows. Apple mounts it read-only. Linux can certainly mount NTFS and do some basic reads and writes. Unfortunately for whatever reason, the Linux fsck tools for NTFS are absolutely terrible, poorly designed and generally can't fix even the most basic of issues. At the same time, mount refuses to work with a partially corrupted filesystem, so if you're dealing with dirty unmounts (where the worst case usually is some unclosed file handle rather than data loss, but this also happens if you try to mount a suspended Windows parititon, which isn't uncommon since Windows hibernates by default and calls it fast boot), that's a boot to Windows just to fix it.* Apple filesystems basically only work on apple devices. It's technically possible to mount them on Linux, but you end up digging into the guts of a bunch of stuff that Apple usually just masks for you.* ext4 is only properly read/write under Linux and requires external drivers under Windows (which may not work properly either, as corruption issues are common).FAT32 is reliable in that any OS can fsck/chkdsk it and properly mount it without needing special drivers, but is hindered by ancient filesize limitations. exFAT, at least for most cases, is the only filesystem you can plug into most devices and expect more or less the same capabilities as FAT32 (read/write support, can fix filesystem corruption.)Out of the os specific ones, NTFS seems like it has the most potential to be the one filesystem that works everywhere; it's modern, works good-ish on most devices, it's just that the fsck/chkdsk tooling is awful outside of Windows.
 - ghrl11 hours ago
 Something MacOS and Windows support natively would be a good start, it could grow from there.
 - Ringz9 hours ago
 Looking at *all* my external drives now... that would be great.
 - tombert10 hours ago
 Everywhere exFAT is supported now. Windows, Mac, Linux, FreeBSD would be fine.
 - pbhjpbhj9 hours ago
 Presumably Microsoft fear making it easy to swap OSes and access the same data."I can use Linux because if I get stuck I can just switch to Windows and still access my data" is a comfort that probably keeps people from even trying Linux (or other OSes)?Why else would MS not support BTRFS/ZFS/Ext or whatever?{I'm not saying that I think this works.}
 - jodrellblank3 hours ago
 Have you seen Linus Torvalds' comments on ZFS from 2020?<a href="https://www.realworldtech.com/forum/?threadid=189711&curpostid=189841&ref=itsfoss.com" rel="nofollow">https://www.realworldtech.com/forum/?threadid=189711&curpost...</a>".. there is no way I can merge any of the ZFS efforts until I get an official letter from Oracle that is signed by their main legal counsel or preferably by Larry Ellison himself .. Don't use ZFS. It's that simple. It was always more of a buzzword than anything else, I feel, and the licensing issues just make it a non-starter for me. .. The benchmarks I've seen do not make ZFS look all that great. And as far as I can tell, it has no real maintenance behind it either any more, so from a long-term stability standpoint, why would you ever want to use it in the first place?"BTRFS: RedHat has removed all support for BTRFS and deprecated it: <a href="https://access.redhat.com/solutions/197643" rel="nofollow">https://access.redhat.com/solutions/197643</a>BTRFS, "Linux's perpetually half-finished filesystem" by ArsTechnica: <a href="https://arstechnica.com/gadgets/2021/09/examining-btrfs-linuxs-perpetually-half-finished-filesystem/" rel="nofollow">https://arstechnica.com/gadgets/2021/09/examining-btrfs-linu...</a> with many problems still unaddressed in 2021 dating back to 2009.
 - iknowstuff8 hours ago
 > Why else would MS not support BTRFS/ZFS/Ext or whatever?You seriously can’t think of another reason? File systems are complex. Maintenance is a huge burden. Getting them wrong is a liability. Reason enough to only support the bare minimum. And then, 99% of their users don’t care about any of those. NTFS is good enough
 StilesCrisis4 hours ago
 NTFS is dog slow. Unfortunately it's nowhere near good enough.
 tombert2 hours ago
 In my mind, in the year 2026, I don't really see the point in using a non-CoW filesystem; it would be nice if the Windows System Restore tool actually worked, and that could be achieved much simpler if there were filesystem-level snapshots.
danborn265 hours ago
It is great to see SQLite getting this level of institutional recognition. The single file format makes archival storage incredibly straightforward compared to traditional database dumps.
testermelon10 hours ago
I'm surprised they included proprietary format that's de facto standard in profession or supported by multiple tools (.xls, .xlsx) in preferred section [1]. I wonder if "well-known enough" is as good as "open" from preservation standpoint.[1] <a href="https://www.loc.gov/preservation/resources/rfs/data.html" rel="nofollow">https://www.loc.gov/preservation/resources/rfs/data.html</a>
- mort9610 hours ago
 Especially when Office 365 shows that not even Microsoft is capable of making software which can display Office files anymore... if you have a Word file which was created or has ever been modified by the Word application, working with it through Office 365 in a browser is such a pain. I've literally had images which are impossible to delete or move in the web version, and they will absolutely render in the wrong place.
- acdha6 hours ago
 Archivists and librarians have to think in terms of practicality: if many tools exist to read something and it’s a mainstream software product, the odds are good that they’ll be unable to use those files 50 years from now. Not certain, but good, and that matters with limited budget and ability to tell the rest of the world what format to provide things in.This can require nuance: for example, PDF has profiles because the core format is widely supported but you could do things like embed plugin content from now-defunct vendors and they would only want the former for long-term preservation.
- pletnes10 hours ago
 You can unzip the xlsx and read the xml inside. It’s not the worst format by far.
 - perching_aix6 hours ago
 What would you reckon is the worst format? I'm very curious of your standards given this.
llagerlof6 hours ago
I used SQLite for a few applications several years ago. One time, the database got corrupted and all the data was lost. That was the day I stopped using SQLite.Also, the lack of enforced column data types was always a negative for me.
- jjice6 hours ago
 No matter the medium, backups are a must.
 - llagerlof5 hours ago
 A hard lesson learned...
- benhurmarcel5 hours ago
 For column types there are STRICT tables now
 - llagerlof5 hours ago
 Thank you!
- Mawr4 hours ago
 I used a hard drive for a few applications several years ago. One time, the drive got corrupted and all the data was lost. That was the day I stopped using hard drives.
 - b40d-48b2-979e4 hours ago
 This isn't the same and you know it.
- justin665 hours ago
 > the database got corruptedWhat caused that?
 - llagerlof5 hours ago
 I don't know why that happened, but one fine day I tried to open the file using the vanilla SQLite client, and it didn't open.
amai45 minutes ago
Which version of SQLite?
fpj4 hours ago
I don't know much about the LoC use case, but my initial reaction to the post is to ask why they are not building a data lake with open formats. I'm sure there are reasons for discarding open-table formats. Claude keeps telling me that the issue is that they don't address preservation properly.
infogulch3 hours ago
SQLite is remarkably versatile. Just a couple weeks ago an extension to do cross-process queues, streams, pub/sub etc in SQLite was released:Show HN: Honker – Postgres NOTIFY/LISTEN Semantics for SQLite | 327 points | 94 comments | <a href="https://news.ycombinator.com/item?id=47874647">https://news.ycombinator.com/item?id=47874647</a>Live notifications was one of the big missing pieces to implement whole apps on a sqlite backend, and now there's a decent solution.
ray_v14 hours ago
It's so funny, because I was JUST telling a colleague of mine - another librarian - this exact fact about sqlite!
semiquaver5 hours ago
It certainly will be in the toolkits of data archeologists hundreds of years from now. Must be a weird feeling to create something so potentially long-lasting.
butterNaN6 hours ago
(US)
WindyBolt9071 hour ago
[dead]
mercaearth8 hours ago
[dead]
WindyBolt90713 hours ago
[dead]
FrozenThane2691 hour ago
[dead]
arian_11 hours ago
[flagged]
openclawclub8 hours ago
[dead]
ksamantha12 hours ago
[flagged]
- cpach10 hours ago
  Welcome to Hacker News! Please write in English here. Thank you in advance from a long-time member :)
  - latexr8 hours ago
    Translating the comments and looking at the bio, wouldn’t be surprised if this is a bot.
guelo9 hours ago
I get annoyed at all the other DBs that require their own heavy duty server process when for 90% of my projects there is only one client, my app server. Is there a DB that combines sqlite's embedded simplicity with higher concurrent write throughput?
- TeriyakiBomb5 hours ago
  I think the concurrent write thing is not as much of an issue nowadays with the speed of NVMEs and WAL.
- graemep4 hours ago
  Firebird, maybe?