I'm always inspired by SQLite. Overall I like it, but if you're not doing writes it's really overkill.<p>So I made a format that will never surpass SQLite, except that it's extremely lighter and faster and works on zstd compressed files. It has really small indexes and can contain binaries or text just like SQLite.<p>The wasm part that decompresses and reads and searches the databases is only 38kb (uncompressed (maybe 16kb gzipped)). Compare that to SQLite's 1.2mb of wasm and glue code it's 3% the size but searching and loading is much faster. My program isn't really column based and isn't suitable for managing spreadsheets, but it's great for dictionaries and file archives of images and audio.<p>I ported the jbig2 decoder as a 17kb wasm module, so I can load monochrome scans that are 8kb per page and still legible.<p><a href="https://github.com/tnelsond/peakslab" rel="nofollow">https://github.com/tnelsond/peakslab</a><p>SQLite is very well engineered, PeakSlab is very simple.
> Compare that to SQLite's 1.2mb of wasm and glue code<p>The current trunk is actually 1.7mb in its canonical unminified form (which includes very nearly as much docs as JS code), split almost evenly between the WASM and JS pieces :/. Edit: it is 1.2mb in minified form, though.<p>Disclosure: i'm its maintainer.<p>Edit: current trunk, for the sake of trivia:<p><pre><code> sqlite3.wasm 896745
sqlite3.mjs 816270 # unminified w/ docs
sqlite3.mjs 431388 # unminified w/o docs
sqlite3.mjs 310975 # minified</code></pre>
Many comments here to your creation, PeakSlab, but not yet a dedicated praise. I didn't know it but I have to say it is really cool and innovative! The performance of the dictionary is indeed superb and I will definitely bookmark this for future reuse. So, in a nutshell: thanks for sharing!
I think actually this competes with the old BerkeleyDB: <a href="https://en.wikipedia.org/wiki/Berkeley_DB" rel="nofollow">https://en.wikipedia.org/wiki/Berkeley_DB</a> - which I now see is no longer BSD-licensed, and in any case has been rendered almost extinct by SQLite. It was used for basic on-disk key-value store work.
It seems more like SSTables, which are widely used by open-source software like LevelDB, HBase, and Cassandra (and Google's BigTable) but AFAIK don't have a standard open-source reader (unless you want to pull the relevant source file out of Cassandra or LevelDB).<p><a href="https://www.igvita.com/2012/02/06/sstable-and-log-structured-storage-leveldb/" rel="nofollow">https://www.igvita.com/2012/02/06/sstable-and-log-structured...</a>
Even BerkeleyDB tries to be mutable. What I'm doing doesn't need the mutability so it's much more similar to dictionary formats (though probably simpler) than it is to a database. Though a lot of people do use full databases for immutable dictionary key-value stuff. I just couldn't get any database to work well enough for a pwa dictionary.
I don't think it has had a BSD license this century, Sleepy Cat was selling licenses in the 90s before Oracle bought them.
SQLite is simple in its own way and I like the design principle of their SQL dialect.<p>"Right joins are just left joins in the wrong direction, you don't need that crap"<p>Of course it always gets simpler or more specialised. I think many apps using databases would run with SQLite just as well. And some would probably run just as well with a textfile instead of any db like SQLite.
> "Right joins are just left joins in the wrong direction, you don't need that crap"<p>SQLite has supported all types of joins since version 3.39 in 2022.
I must've messed something up, but I remember some joins (was it full outer join?) being unbelievably slow? Was I doing something wrong?
Well, look at that, now it is downhill from here!
For the love of god, don't do blank textiles anymore. In the end you have a software that has 20 (or more) individual files for each programs section, which works fine until you want the files to be consistent. Boom. And then you add a lock to fix it and suddenly your whole program can only run sequentially. And then your customers ask why it's so slow in ingress. I won't name any names here, but this is a real commercial product.
We use a cheap invoicing program. It works fine except it gets very slow when dealing with large numbers if invoices. Turns out each invoice (or payment record, or customer record, or whatever) is a separate text file with form-urlencoded data. No indices.
A more standard solution would be cdb.[0] Although that doesn't support compressed data.<p>[0] <a href="https://cdb.cr.yp.to/" rel="nofollow">https://cdb.cr.yp.to/</a> , <a href="https://en.wikipedia.org/wiki/Cdb_(software)" rel="nofollow">https://en.wikipedia.org/wiki/Cdb_(software)</a>
Overkill in what way exactly? The LOC of the project shouldn't have any bearing on most people's usage of the project. SQLite is one of the well tested and mature projects in the world. What exactly would motivate someone to use PeakSlab instead? What problem are you solving?
I'm solving a simpler problem. Just making cross platform dictionary progressive web apps with indexes and full text search and HTML tags and uppercase letters inserted back into the text on render so they don't interfere with search.<p>SQLite is 1.2mb in combined wasm and JavaScript and not really designed for my use case, so I would have to add all the things i need anyway like compression and HTML tag insertion. For my use case which is just for pwas SQLite takes too long to load and the files are too big and the search isn't tailored. So I made something else in 38kb instead
Read the comment. He's using it in WASM form and doesn't want users to have to download 1.2MB of SQLite every time they visit the page.
Client caches are a thing, so this is most relevant for cold-start customers. In that case PeakSlab’s download size is an advantage.<p>Fwiw LocalStorage is a SQLite db on most browsers, with a kv api. It’s be interesting to have the actual API available.
Even on warm start PeakSlab is twice as fast. It's not just download size, it's execution speed, zero copy, database decompression, etc.<p>That's why PeakSlab is written in c, because what's faster than casting the whole database to a struct? ;-P
If you're not modifying data, whatever system is using the data doesn't need a database at all, it just needs a data export.
Perhaps a dumb question, but how do you get data into it if you’re not doing writes
I think it's just immutable once you've generated it. No need to update indexes or check consistency on writes, no need for transactions, etc.
I have a system that builds SQLite databases and uploads them to S3. Once they're in S3, they are never changed. The program that builds the databases only does writes, and the program that queries the databases only does reads. It uses a VFS to query the database in-place with HTTP range requests.<p>This is indeed <i>not</i> an optimal setup. A more careful design from first principles would not require seeking around the file as much as SQLite does, we'd do a better job on reading exactly the correct range of bytes for a given query since we know ahead of time what the access patterns are, and we could do reads in parallel. With SQLite we have to be very careful about the schema design to ensure it won't have to seek too many times to answer a query. But SQLite was expedient, and I'm confident I'll always be able to read the files. That's less certain for a custom file format.
Generate it one time from a source tsv file or folder of media.
Think historical records of, say, share values for past years. You might have a single db for 1900-2000, for instance. Things like that.<p>Not everything needs to be real-time updated.
It’s an RODB. Ship the preindexed data blob.
It is crashing Safari.
something something XKCD competing standards something something
Creating something new for a different use case isn't pointless. It's like comparing inline skates to ice skates.
Believe me, I tried sticking to SQLite or aard2 or stardict, they just were fundamentally inadequate with no good pwa cross platform tooling.
Doesn’t even apply unless someone says that (1) there are too many “standards”, and (2) so we are making this standard (neither apply here). Someone made something.<p>We should really consider eventually retiring memes because they just end up as thought-terminating cliches.<p>This is of course referring to xkcd #927. How do I know that?
I have always loved SQLite.<p>I have also heard that some firms ban its use.<p>Why?<p>Because it makes it SO easy to set up a database for your app that you end up with a super critical component of your application that looks exactly like a file. A file that can have any extension. And that file can be copied around to other servers. Even if there is PII in that file. Multiply this times the number of applications in your firm and you can see how this could get a little nuts.<p>DevOps and DBA teams would prefer that the database be a big, heavy iron thing that is very obviously a database server. And when you connect to it, that's also very obvious etc etc.<p>I still love SQLite though.
The question is, do the same firms ban Excel? Excel spreadsheets often end up as shadow databases in unlikely places.
This might catch flak, but generalizing I would assume that the people banning things are the same people who would use excel for something where a database would be better, and if so, that is the reason Excel isn't banned on the same conditionals that would get sqlite banned.
The sane thing would be to ban Excel and promote SQLite. Excel is often used for tabulated text (issue tracking) not calculations. Perfect use case for a relational db
Excel has sheets for tables, columns and rows, primary keys (UNIQUE), foreign key references etc if you squint.<p>It doesn't require you use all of that <i>properly</i>, but it's there.
Excel is made for calculations. But if you make it hard to make a DB, people will abuse Excel as a DB.
I mean, it might have been at first, but Microsoft figured out that the majority of users for lists without formulas in 1993 and they've strategized around that. IMHO, the biggest concession to this was when they added Power Query to core Excel in 2016.
or reimplement excel with sqlite as a backend :-D<p>BTW sqlite can run SQL queries on CSV files with relatively simple one-liner command...
Well heck can't someone make an SQLite extension that is basically just a simplified Excel ?
and excel has gui for forms
PII sniffers are pretty good at dealing with excel files. Excel is seen more as an analyst tool than a dev tool. Any place that bans Excel needs to either let analysts use some other turing complete data tools, like python or R or something, or they'll have trouble attracting analyst talent. They'll have devs and data entry users and that's it.<p>The only way that works is if the dev team is large enough to be responsive to business needs, which almost never happens because devs are expensive. The juniors who are tweaking business logic every day are functionally doing a role analysts can do if you just give them a sane API and data tools.
You can enforce classification and privacy labels (or something similar) in Excel and other document files, at least in a closed corporate environment. Azure also supports this. Also, everyone has Office installed (in a corporate environment), anyone can open and work with an Excel file.
IMO, almost any Excel more than a month old should become readonly.
You should consider knock-on effects of this brilliant idea. Now there would be copies of spreadsheets younger than a month that get replicated 47 billion times, exponentially compounding the problem you're trying to solve.<p>This sounds like how we pass so many stupid laws. Nobody thinks about 2nd order effects.
I’ve worked at some organisations that have strict rules (not always strictly followed) about what can go in Excel spreadsheets, and where they have to be stored. The C drive is verboten. Some also have standards about classification and labelling of PII and sensitive data.
They generally cannot. But they do banish Access.
Don't get me started on Access...
Man, Access could've been so good if they just made an app around SQLite. Or since it's Microsoft and they need to do everything their own way, it would've been so good if they made a flat file DB à la SQLite, but with T-SQL (or a subset thereof) instead of JET-SQL.<p>Increase interoperability. Funnel data people from Excel into real DB technologies.<p>And if they did more to blur the lines between spreadsheets and databases, and make it seamless to work out of both Excel and Access, add more spreadsheet features to the data views, etc.
Do companies ban text files? Text files are used to store data.
That's why you store them on unsaved tabs instead.
Do companies ban data centers? It's crazy to send PII to other computers on the line.
Do companies ban brains? Brains are used to store data.
There are interesting uses for sqlite, like this one:
<a href="https://sqlite.org/sqlar.html" rel="nofollow">https://sqlite.org/sqlar.html</a>
Required reading for “anything can become a mission critical database” conversations:<p><a href="https://www.reddit.com/r/sysadmin/comments/eaphr8/a_dropbox_account_gave_me_stomach_ulcers/" rel="nofollow">https://www.reddit.com/r/sysadmin/comments/eaphr8/a_dropbox_...</a>
This "shadow IT DBA" issue has always been a classic problem with Access databases, too.
This is why I put configs like that into AppData or dotfile directories, or the equivalent for MacOS (I forget which one it is inside of the ~/Library directory).
I recently watched a YT video about this subject: <a href="https://www.youtube.com/watch?v=lSVgeMoXJTs" rel="nofollow">https://www.youtube.com/watch?v=lSVgeMoXJTs</a><p>In summary, companies use the bus-metric to see how viable a project is. Bus, as in, how many people can be hit by a bus before there is no one left to maintain the project.<p>Despite its ubiquity, SQLite is maintained by only 3 people. That bus-metric for SQLite is 3, which is way too low for some companies.<p>Give the link a watch; it was really interesting.
Some firms don't understand how to do data management, and if we draw the venn diagram of those and the ones that ban sqlite, it'd be pretty close to a circle.<p>Yes, databases could have any extension. No sane dev team would accept code that doesn't use an object extension for a sqlite database.<p>Yes, databases can contain PII but no sane product manager would go "yes, that's a good use of sqlite".<p>Yes, you can trivially copy database files, but no sane product needs to in the same way that no sane product should require folks to just clone the db just to do some work.<p>Pretty much every reason a company has for banning sqlite is a red flag for working there.
> a file that can have any extension<p>So read the magic number, you shouldn't trust file extensions anyway<p>> that file can be copied around to other servers<p>So can spreadsheets<p>I'm not discounting that having centralized data access is desirable but it doesn't sound like that particular reasoning is well thought out
DevOPs and DBAs must hate RAM and caches. We
That's so dumb
> DevOps and DBA teams<p>Ah so two teams nobody should listen to.
I went from thinking “SQLite is a toy product, not reliable for real data" to "lets use SQLite for almost everything"<p>SQLite is very good if you can fit into the single writer, multiple readers pattern; you'll never lose data if you use the correct settings, which takes a minute of Google search to figure out.<p>Today, most of my apps are simply go binary + SQLite + systemd service file.<p>I've yet to lose data. Performance is great and plenty for most apps
The single writer is less of an issue in practice than it's made out to be. Modern nvme drives are incredible and it's trivial to get 5k writes per second in an optimized WAL setup. Way more than most apps could ever dream.<p>And even then, I've used a batch writer pattern to get 180k writes per second on a commodity vps.
all* of that + sharding -> <a href="https://sqlite.org/lang_attach.html" rel="nofollow">https://sqlite.org/lang_attach.html</a><p>ex: main.db + fts.db. reading and writing to main.db is always available; updating the fts index can be done without blocking the main database — it only needs to read, the reads can be chunked, and delayed. fts.db keeps the index + a cursor table — an id or last change ts<p>could also use a shard to handle tables for metrics, or simply move old data out of main.db<p>* some examples:<p><pre><code> conn = sqlite3.connect("data.db")
conn.execute("PRAGMA journal_mode=WAL") # concurrent reads (see above)
conn.execute("PRAGMA synchronous=NORMAL") # fsync at checkpoint, not every commit
conn.execute("PRAGMA cache_size=-62500") # ~61 MB page cache (negative = KB)
conn.execute("PRAGMA temp_store=MEMORY") # temp tables and indexes in RAM
conn.execute("PRAGMA busy_timeout=5000") # wait 5s on lock instead of failing
</code></pre>
edit: orms will obliterate your performance — use raw queries instead. just make sure to run static analysis on your code base to catch sqli bugs.<p><i>my replies are being ratelimited, so let me add this</i><p>the <i>heavy duty server</i> other databases have is doing that <i>load bearing</i> work that folks tend to complain about sqlite can't do<p>the <i>real</i> dmbs's are doing mostly the same work that sqlite does, you just don't have to think about it once they're set up. behind that chunky server process the database is still dealing with writing your data to a filesystem, handling transaction locks, etc.<p>by default sqlite gives you a stable database file, that when you see the transaction complete, it means the changes have been committed to storage, and cannot be lost if the machine were to crash exactly after that.<p>you can decide to wave some, or all of those guaranties in exchange for performance, and this doesn't even have to be an all or nothing situation.
I usually try to explain it like this: “Single writer” is rarely a real problem, because a writer is not slow. It writes exclusively, but very quickly.<p>"Batch writer pattern" is a good idea to get rid of expensive commits.
For me, the concern about SQLite has never been if the database engine itself is “reliable for real data”, but that storing data on a single node is not “reliable for real data”. Performance aside, what you are positing is no different than dumping everything to a text file on disk. What happens if that VM dies?
Do you use multiple backend nodes? If yes, how do you access sqlite files from different nodes?
I use it for apps which don't need multiple backend nodes.<p>When i actually have something that requires multi nodes, i just use postgres (with replica) or mongo (with replica).<p>But it's for those apps which are in autoscaler.<p>For bulk data refresh I use build artifact and hotreload memort mapped files, by checking a manifest on object storage then only getting update if newer.<p>I've used this pattern everywhere and never really needed anything more, occasionally i might use redis if something required shared state across multiple nodes and fast.
2026 recommended storage formats: <a href="https://www.loc.gov/preservation/resources/rfs/data.html" rel="nofollow">https://www.loc.gov/preservation/resources/rfs/data.html</a>
Taking a minute to appreciate the level of long term thinking required for storing data, to plan for 300-500 years into the future, to be able to withstand all kinds of innovations, and survive basic obsolescence.<p>What is the longest surviving paper medium?
Seems like they're pretty lax about their recommendations tbh. XLS is "preferred".
> As of this writing (2018-05-29) ...<p>So this news is nearly <del>six</del> EIGHT years old. But I didn't happen to know about it until now, so that's not a complaint at all; rather, this is a thank-you for posting it.<p>(Thanks for the correction. Brief brain malfunction in the math department there).
Sir, it's 2026. It's 8 years old.
Was going to say, was having deja vu reading this
For public-sector data preservation, it may be one of the best options.<p>The specification is publicly available<p>- It is widely adopted
- It is likely to remain readable in the future
- It has little dependency on specific operating systems or services
- It carries low patent risk<p>From the perspective of long-term continuity, avoiding dependence on any particular company or service is extremely important.
I love SQLite and thanks for sharing it but there should be a "(2018)" at the end in the title:<p>> As of this writing (2018-05-29) the only other recommended storage formats for datasets are XML, JSON, and CSV.
FYI, they added a lot more formats to the list after that.<p><pre><code> Preferred
1. Platform-independent, character-based formats are preferred over native or binary formats as long as data is complete, and retains full detail and precision. Preferred formats include well-developed, widely adopted, de facto marketplace standards, e.g.
a. Formats using well known schemas with public validation tool available
b. Line-oriented, e.g. TSV, CSV, fixed-width
c. Platform-independent open formats, e.g. .db, .db3, .sqlite, .sqlite3
2. Any proprietary format that is a de facto standard for a profession or supported by multiple tools (e.g. Excel .xls or .xlsx, Shapefile)
3. Character Encoding, in descending order of preference:
a. UTF-8, UTF-16 (with BOM),
b. US-ASCII or ISO 8859-1
c. Other named encoding
---
Acceptable
For data (in order of preference):
1. Non-proprietary, publicly documented formats endorsed as standards by a professional community or government agency, e.g. CDF, HDF
2. Text-based data formats with available schema
For aggregation or transfer:
1. ZIP, RAR, tar, 7z with no encryption, password or other protection mechanisms.
</code></pre>
<a href="https://www.loc.gov/preservation/resources/rfs/data.html" rel="nofollow">https://www.loc.gov/preservation/resources/rfs/data.html</a>
.7z being there just discredits the entire process. The underlying compression algorithm is a free-hand one and can be anything[0], or contain bugs and exploits[1]. Personally I use only zstd with .7z which is 'non-standard' by the official (Russian) release.<p>[0]: <a href="https://7-zip.org/7z.html" rel="nofollow">https://7-zip.org/7z.html</a><p>[1]: CVE-2025-0411
I use postgresql for my startup but every time i needed a quick local testing i wish it was as simple as sqlite. No config just works.
Just yesterday it occurred to me that it had been a while since I last saw an SQLite post at the top of HN.<p>I really like the simplicity and speed of SQLite, I've used in both personal and professional projects. For day-to-day work I still end up in Excel, not because I like it more (I don't), but because its ubiquity makes it the lowest friction way to share & explore datasets with less technical stakeholders and execs.
I'm under no illusion I'll suddenly shatter your world views with this, but in case it's as useful to you as it was to me, you might want to check out Metabase[1].<p>You can self-host and if all you care about is showing data in a digestible format to stakeholders, it's really simple. You can of course go overboard and regret all of your life's decisions with it, but I try and abstain myself.<p>[1]: <a href="https://www.metabase.com/" rel="nofollow">https://www.metabase.com/</a>
I've always been irked by how SQLite relies on text parsing to work. Why is it that I have to write queries in text rather than expressing them in programmatic logic? I have never used a relational database because of this, because I hate them, but they can be more performant than pure structured data, but I hate SQL and the entire idea of SQL and I don't want to write it or learn it or use a system that relies on it. It feels like the wrong approach, on the level of PHP. Is there anything I can do to help this? I don't want to keep passing up SQLite just because of SQL, but I just can't seem to agree with it. I don't want to build strings or have string parsing anywhere in the stack, it just feels wrong.
If you want to avoid string manipulation then you can construct queries with a query builder API like C#'s LINQ. Other languages have similar libraries, e.g., Rust has Diesel.<p>If your objection is to the SQL language itself then you might find Datalog interesting. Datalog is a logic-based language where you query by writing predicates rather than writing SQL statements. Check out Logica <<a href="https://logica.dev" rel="nofollow">https://logica.dev</a>>. It's a language in the Datalog family that compiles to SQL.<p>In both cases, SQL is used only as a low-level IR for interfacing with the database engine.
A "prepared statement" is a precompiled SQL command, ready for bindings and execution: <a href="https://sqlite.org/c3ref/stmt.html" rel="nofollow">https://sqlite.org/c3ref/stmt.html</a><p>You can't precompile your SQL at build time, unfortunately, but you _can_ precompile all your SQL at the very start of your program and then never touch the parser again. This might be a good middle ground for you. It is infra that you can centralize, write some unit tests against, and then not worry about forever.<p>It's not common because the SQLite parser is lightning fast and it's so convenient to just write out a new query as you need one, versus having one bucket of all queries. But it's an option!
I bet you really love LLMs
On a recent project I have needed to use exFAT. exFAT is terrible for a number of reasons, but in my case the thing I had to deal with was the lack of journaling, which had the possibility to corrupt files if there were a power interruption or something.<p>I initially was writing a series of files and doing some quasi-append-only things with new files and compacting the old one to sort of reinvent journaling. What I did more or less worked but it was very ad hoc and bad and was probably hiding a lot of bugs I would eventually have to fix later.<p>And then I remembered SQLite. I realized that ACID was probably safe enough for my needs, and then all the hard parts I was reinventing were probably faster and less likely to break if I used something thoroughly audited and tested, so I reworked everything I was doing to SQLite and it worked fine.<p>I wish exFAT would die in a fire and a journaling filesystem would replace it as the "one filesystem you can use everywhere", but until it does I'm grateful SQLite exists.
The problem with it is you didn't solve your biggest actual problem, you just haven't had a problem bite you in the ass yet so you think your problem is solved.
> I wish exFAT would die in a fire and a journaling filesystem would replace it as the "one filesystem you can use everywhere"<p>Where exactly is everywhere? Win32? All of Linux? BSDs? MacOS? IOS? ...
Everywhere in the sense of "I have a USB stick/SD card, what do I format it to so that every major device I'm using can read it".<p>In practice, every OS has its preferred system and the rest has varying levels of "I guess this works", with FAT32 and exFAT being the only real cross-platform options.<p>To wit:<p>* NTFS is only really properly and fully supported on Windows. Apple mounts it read-only. Linux can certainly mount NTFS and do some basic reads and writes. Unfortunately for whatever reason, the Linux fsck tools for NTFS are absolutely terrible, poorly designed and generally can't fix even the most basic of issues. At the same time, mount refuses to work with a partially corrupted filesystem, so if you're dealing with dirty unmounts (where the worst case usually is some unclosed file handle rather than data loss, but this also happens if you try to mount a suspended Windows parititon, which isn't uncommon since Windows hibernates by default and calls it fast boot), that's a boot to Windows just to fix it.<p>* Apple filesystems basically only work on apple devices. It's technically possible to mount them on Linux, but you end up digging into the guts of a bunch of stuff that Apple usually just masks for you.<p>* ext4 is only properly read/write under Linux and requires external drivers under Windows (which may not work properly either, as corruption issues are common).<p>FAT32 is reliable in that any OS can fsck/chkdsk it and properly mount it without needing special drivers, but is hindered by ancient filesize limitations. exFAT, at least for most cases, is the only filesystem you can plug into most devices and expect more or less the same capabilities as FAT32 (read/write support, can fix filesystem corruption.)<p>Out of the os specific ones, NTFS seems like it has the most potential to be the one filesystem that works everywhere; it's modern, works good-ish on most devices, it's just that the fsck/chkdsk tooling is awful outside of Windows.
Something MacOS and Windows support natively would be a good start, it could grow from there.
Everywhere exFAT is supported now. Windows, Mac, Linux, FreeBSD would be fine.
It is great to see SQLite getting this level of institutional recognition. The single file format makes archival storage incredibly straightforward compared to traditional database dumps.
I'm surprised they included proprietary format that's de facto standard in profession or supported by multiple tools (.xls, .xlsx) in preferred section [1]. I wonder if "well-known enough" is as good as "open" from preservation standpoint.<p>[1] <a href="https://www.loc.gov/preservation/resources/rfs/data.html" rel="nofollow">https://www.loc.gov/preservation/resources/rfs/data.html</a>
Especially when Office 365 shows that not even <i>Microsoft</i> is capable of making software which can display Office files anymore... if you have a Word file which was created or has ever been modified by the Word application, working with it through Office 365 in a browser is such a pain. I've literally had images which are <i>impossible</i> to delete or move in the web version, and they will absolutely render in the wrong place.
Archivists and librarians have to think in terms of practicality: if many tools exist to read something and it’s a mainstream software product, the odds are good that they’ll be unable to use those files 50 years from now. Not certain, but good, and that matters with limited budget and ability to tell the rest of the world what format to provide things in.<p>This can require nuance: for example, PDF has profiles because the core format is widely supported but you could do things like embed plugin content from now-defunct vendors and they would only want the former for long-term preservation.
You can unzip the xlsx and read the xml inside. It’s not the worst format by far.
I used SQLite for a few applications several years ago. One time, the database got corrupted and all the data was lost. That was the day I stopped using SQLite.<p>Also, the lack of enforced column data types was always a negative for me.
No matter the medium, backups are a must.
For column types there are STRICT tables now
I used a hard drive for a few applications several years ago. One time, the drive got corrupted and all the data was lost. That was the day I stopped using hard drives.
> the database got corrupted<p>What caused that?
Which version of SQLite?
I don't know much about the LoC use case, but my initial reaction to the post is to ask why they are not building a data lake with open formats. I'm sure there are reasons for discarding open-table formats. Claude keeps telling me that the issue is that they don't address preservation properly.
SQLite is remarkably versatile. Just a couple weeks ago an extension to do cross-process queues, streams, pub/sub etc in SQLite was released:<p>Show HN: Honker – Postgres NOTIFY/LISTEN Semantics for SQLite | 327 points | 94 comments | <a href="https://news.ycombinator.com/item?id=47874647">https://news.ycombinator.com/item?id=47874647</a><p>Live notifications was one of the big missing pieces to implement whole apps on a sqlite backend, and now there's a decent solution.
It's so funny, because I was JUST telling a colleague of mine - another librarian - this exact fact about sqlite!
It certainly will be in the toolkits of data archeologists hundreds of years from now. Must be a weird feeling to create something so potentially long-lasting.
(US)
[dead]
[dead]
[dead]
[dead]
[flagged]
[dead]
[flagged]
I get annoyed at all the other DBs that require their own heavy duty server process when for 90% of my projects there is only one client, my app server. Is there a DB that combines sqlite's embedded simplicity with higher concurrent write throughput?