Lightning Memory-Mapped Database Manager (LMDB) 1.0

(lmdb.tech)

60 points by radiator5 hours ago

7 comments

hmry4 hours ago
Do people have good experiences with LMDB, in terms of reliability? I've never used it in production, but I've read through the code and design documents for a database implementation class.I remember some strange code (such as pushing return values 4k above the stack, with a comment like "this works as long as the caller doesn't use more than 4k of stack space before accessing the return value"), and the author also shared some unconventional opinions about undefined behavior (like "Compilers are deterministic, if I know what platform I'm compiling to then no behavior is undefined. And if compiler authors disagree, they are morons.")But presumably it's thoroughly tested, so those aren't problems in practice? Would be really interested to hear from people who've actually used it. I've mainly stuck to SQLite instead.
- markasoftware4 hours ago
 Not amazing. In certain workloads I ran, once the db reached several hundred gb, writes would hang for longer and longer periods of time, eventually hours, while the db grew drastically in the background. <a href="https://news.ycombinator.com/item?id=30023623">https://news.ycombinator.com/item?id=30023623</a> seems to be the same issue, and it was serious enough that Shopify decided not to use lmdb.And yes, I ensured there were no outstanding long lived readers, verified with mdb_stat -r. My workload used one transaction per read/write anyway (never needed larger atomicity). Once the db got into the bad state, running my program on it would almost immediately run into the issue again, so I really think the db is in a bad state such that most writes would cause it to hang, not related to how I do transactions. This workload would pretty consistently hit the issue once the db got to several hundred gb.Issue #10236 on the OpenLDAP bug tracker might be the root cause, who knows. It's been marked CONFIRMED for years without a fix, while other similar issues are created.This is extremely annoying. It seems workload dependent (other workloads I've run create absolutely massive lmdb dbs without this issue) and once it happens your only recourse is to make a new db and copy the contents over (thankfully reads still work fine on these borked dbs).Other than that, though, it's great. Never in any case had actual data corruption, and reads and writes are extremely fast (until this issue happens)Edit: fun fact, since shopify may have created Bolt in response to this bug, and then Bolt was the root cause of the 73-hour Roblox downtime in 2021, this bug may indirectly have caused one of the worst outages ever!
 - uroni1 hour ago
 That it keeps an infinite cache of malloc page allocations is annoying (the issue you referenced). I just removed that (after complaining on the mailing list about it). The performance advantage is probably negligible in many cases (since malloc implementations often already cache), while causing confusing memory usage behavior.Idk, if it was your issue, but for long running write transactions it doesn't spill to disk. So you have all the changes being written to disk at the end of the transaction. One would think enabling write mapping fixes this, but it needs to mark all the pages as clean before commit, so same effect there. I fixed this for 0.9 here <a href="https://github.com/uroni/hs5/tree/main/external/lmdb" rel="nofollow">https://github.com/uroni/hs5/tree/main/external/lmdb</a> . Will have to investigate if it is improved with 1.0, or if I need to redo the changes.Edit: Just noticed that the issue is about free list in the file. Never had a problem with that, but I also had to replace that MIDL structure with something more scalable for the spilling.
 - markasoftware43 minutes ago
 FWIW I had this issue even with the MDB_NOSYNC flag so it shouldnt be force flushing to disk unless I'm out of ram or whatever
 - jnwatson1 hour ago
 I've used LMDB in production for multi-terabyte databases, and we encountered the long-write time but found a solution.The important idea is that LMDB offloads cache management almost completely to the OS. You have to become intimately familiar with the way that the page cache works and how to configure it.
- nomel27 minutes ago
 I think lmdb is mostly unusable, for many use cases. I switched to libmdbx, which fixes all the issues [2] I (and most sibling comments) ran into with lmdb.[1] <a href="https://github.com/Mithril-mine/libmdbx#improvements-beyond-lmdb" rel="nofollow">https://github.com/Mithril-mine/libmdbx#improvements-beyond-...</a>
- ChrisTrenkamp4 hours ago
 I can't go into specifics, but I use LMDB for the commandline application I maintain for my employer. I also extended it into a web service for internal use. As long as you stick to the safe LMDB options, which are the default options, it's reliable. The documentation clearly outlines what safety guarantees you lose when you enable/disable certain options: <a href="http://www.lmdb.tech/doc/group__mdb.html#ga32a193c6bf4d7d5c5d579e71f22e9340" rel="nofollow">http://www.lmdb.tech/doc/group__mdb.html#ga32a193c6bf4d7d5c5...</a>I had a situation where the web service's writes were slowing down to an unbearable crawl because the number of entries in the database were reaching tens of billions of entries. Thankfully, the users never experienced the slowness. The website stayed nice and fast, even though the background updates were extraordinarily slow. The issue was fixed by sharding the databases.
- erikschoster2 hours ago
 I use it as a session store for a computer music system. It has worked well for me as a way to read mutable (by any client) parameters during synthesis, clients will often read dozens of parameters during a block of computation (a relatively short window of time in the low milliseconds typically) without adding any noticeable overhead to the render time for each block.Edit: I also tried using it for larger blobs of data (like audio) but ended up only storing a reference to shared memory for larger blocks, anything larger than IIRC 4k that can't be stored in a single node kills performance, but for small values it seems pretty great.
- radiator4 hours ago
 It has been used successfully as the backend for OpenLDAP and Monero, at least.
- thombles4 hours ago
 Be cautious if you're using large databases on iOS. At least until fairly recently, iOS doesn't page dirty mmaped pages back to disk and after enough churn the app will OOM.
 - zbentley2 hours ago
 Wow, really?Then what’s the point of memory mapping in the fist place? Or do they suggest manual flush/sync actions for persistence.
 - tynorf1 hour ago
 IIRC: it is to leverage the OS page cache rather than having a separate buffer pool in user land. By default lmdb uses normal pwrite/fsync for the write path, but can optionally use a writable mapping and (presumably) msync.However, some people think there are problems with this usage: (pdf warning) <a href="https://www.cidrdb.org/cidr2022/papers/p13-crotty.pdf" rel="nofollow">https://www.cidrdb.org/cidr2022/papers/p13-crotty.pdf</a>
- packetlost4 hours ago
 I believe at least one of the two official Minecraft implementations use it for their map/save format.
- OrangeDelonge2 hours ago
 We evaluated it but chose RocksDB instead
- ozgrakkurt4 hours ago
 It is a small amount of code so easy to integrate into an application.It is really reliable except write performance in my experience.Author of it writes very spicy stuff and sounds pretty rude.I would recommend doing a prototype with real data scale and testing if it meets your requirements. The write performance can be really atrocious and It doesn't have a high performance potential because it is based on memmap.
hilariously4 hours ago
Maybe rephrase this part - "It is read-only by default as this provides total immunity to corruption. Using read-write mode offers much higher write performance, but adds the possibility for stray application writes thru pointers to silently corrupt the database."I generally do think read-write mode would offer higher write performance than read only as well :)
- wmanley3 hours ago
 The context is in the sentence before your quote:> The memory map can be used as a read-only or read-write map.So presumably lmdb writes to the database using the `pwrite` syscall by default, but can optionally write via the mmap instead - if you are willing to accept the increased risk of accidental data corruption.
radiator5 hours ago
New features in LMDB 1.0 include:- support for incremental backup- support for page-level checksums and encryption- support for DB on raw block devices- support for 2-phase commit- support for page sizes up to 64KBplus other minor additions to the API.
initramfs3 hours ago
<a href="https://www.meilisearch.com/docs/resources/internals/storage" rel="nofollow">https://www.meilisearch.com/docs/resources/internals/storage</a>
jnwatson1 hour ago
Bummer. I'm the maintainer of the Python bindings.I have to figure out how to support both versions now...
paveworld5 hours ago
HTTP ?? Com’on man
- radiator5 hours ago
 it is just a link to documentation
 - jayct3 hours ago
 that could easily be trojan-horsed with links to malware if you are viewing it in a poorly secured setting (like public wifi), because you can't verify the origin. so the best we can say about the author is that we are getting inconsistent signals on how seriously they understand and implement security concerns. so better review that code carefully before use, rather than assuming their expertise from release notes.
 - ccapitalK53 minutes ago
 Obligatory <a href="https://doesmysiteneedhttps.com/" rel="nofollow">https://doesmysiteneedhttps.com/</a>
 - Retr0id4 hours ago
 TLS certs are freeeeee
 - radiator4 hours ago
 Judging from this very release, where he implemented support for page-level checksums and encryption for LMBD, I assume the author knows a thing or two about encryption. He probably then deemed it unnecessary for this specific website.
 - Retr0id4 hours ago
 Cryptography engineers are not excluded from being lazy sysadmins.
 radiator4 hours ago
 What do you mean "lazy"? I thought you said TLS certs were free. Do you mean they cost something after all? Time, for example?Anyway, of course in case you feel the website is a risk, you should refrain from using it. Safety comes first.
quotemstr3 hours ago
I've never understood the fascination some people have with mmap. Memory-mapped file IO is just a RAM cache combined with a hidden system call (a page fault) to fill the cache. You can do the same thing yourself by using O_DIRECT to fill regular anonymous memory. If you're feeling social, you can fill a mapped and shared memfd.You can seal memfds too, which means that the "read-only" mode is easy to implement: just map your memfd for write, apply F_SEAL_FUTURE_WRITE, and share the memfd to anyone you want to have read-only access.By doing your own O_DIRECT IO instead of relying on the kernel's defaults, you get a lot more control. You choose how much readahead to do; you choose your read-cluster size. You choose your cache eviction strategy. You choose when to write back.BTW: O_DIRECT can also be done asynchronously using aio or io_uring. There's no such thing as an asynchronous page fault. And IO errors? Would you rather deal with EIO or SIGBUS?Why would you want the kernel to do these things for you? It'll do a worse job: it has less information than you do and has to use blunt heuristics that work sort-of-good-enough for the whole world, not just your program.And it's not any faster either. O_DIRECT is DMA. A page cache fill is also DMA. It's the same operation, spelled differently.
- wmanley2 hours ago
 I use mmap with my SQLite database[1] because I have many concurrent SQLite connections (one per concurrent HTTP request) and I don't want each connection to have its own 2MB cache[2]. It's better that all the connections simply share the page cache.[1]: <a href="https://sqlite.org/pragma.html#pragma_mmap_size" rel="nofollow">https://sqlite.org/pragma.html#pragma_mmap_size</a>[2]: <a href="https://sqlite.org/pragma.html#pragma_cache_size" rel="nofollow">https://sqlite.org/pragma.html#pragma_cache_size</a>
- bagxrvxpepzn3 hours ago
 > I've never understood the fascination some people have with mmap.Uncommonly used system calls give user-space programmers the sensation of learning something.> Why would you want the kernel to do these things for you? It'll do a worse job: it has less information than you do and has to use blunt heuristics that work sort-of-good-enough for the whole world, not just your program.Yes, you're opting into non-determinism you don't control. When resources get constrained and everything can't be in memory and someone asks you why the database sucks, all you'll be able to do is shrug. Anyone who builds critical systems would never rely on the kernel making decisions like this. Don't use LMDB for anything that matters.
 - jnwatson1 hour ago
 You're already depending on the OS for many other things. Depending on it for page caching is just one more thing.
 - bagxrvxpepzn55 minutes ago
 This level of reasoning is insufficient when building reliable systems. The consequences of depending on the OS for page caching are different than the consequences of depending on it for device drivers, file systems, or scheduling.
- teravor2 hours ago
 with mmap you also don't have to worry about committing too much system memory, if another application needs it it will start evicting your cache.
 - quotemstr2 hours ago
 You're right about that.Linux needs a way for userspace processes to participate in the kernel's shrinker system for reclaiming memory under pressure. Watching memory PSI is too coarse. MADV_FREE is too complicated and indiscriminate. You could imagine a notification FD, but then you've just reinvented PSI. You could imagine a synchronous signal, but everyone hates signals and won't couple any new functionality to them.Shrinker-BPF attached to a memfd perhaps? A BPF shrinker could not only choose which pages to evict in a non-stupid way, but could notify userspace in some sane manner (e.g. setting a bitmask somewhere) that it's done so.(Zero-fill as "notification" is insane and doesn't actually work because zero is a perfectly valid value in a lot of contexts.)
- ok1234563 hours ago
 The OS handles all of that transparently, without requiring any additional code. I think that is the draw.
 - quotemstr3 hours ago
 And that's adequate for casual programs. LMDB is big and serious enough to warrant the extra complexity (which, to be fair, is significant) of userspace buffer management. LMDB does the work once and all users benefit.