A few years back I patched the memory allocator used by the Cloudflare Workers runtime to overwrite all memory with a static byte pattern on free, so that uninitialized allocations contain nothing interesting.<p>We expected this to hurt performance, but we were unable to measure any impact in practice.<p>Everyone still working in memory-unsafe languages should really just do this IMO. It would have mitigated this Mongo bug.
Recent macOS versions zero out memory on free, which improves the efficacy of memory compression. Apparently it’s a net performance gain in the average case
<i>A few years back I patched the memory allocator used by the Cloudflare Workers runtime to overwrite all memory with a static byte pattern on free, so that uninitialized allocations contain nothing interesting.</i><p>Note that many malloc implementations will do this for you given an appropriate environment, e.g. setting MALLOC_CONF to opt.junk=free will do this on FreeBSD.
> OpenBSD uses 0xdb to fill newly allocated memory and 0xdf to fill memory upon being freed. This helps developers catch "use-before-initialization" (seeing 0xdb) and "use-after-free" (seeing 0xdf) bugs quickly.<p>Looks like this is the default in OpenBSD.
You know, I never even considered doing that but it makes sense; whatever overhead that's incurred by doing that static byte pattern is still almost certainly minuscule compared to the overhead of something like a garbage collector.
FYI, at least in C/C++, the compiler is free to throw away assignments to any memory pointed to by a pointer if said pointer is about to be passed to free(), so depending on how you did this, no perf impact could have been because your compiler removed the assignment. This will even affect a call to memset()<p>see here: <a href="https://godbolt.org/z/rMa8MbYox" rel="nofollow">https://godbolt.org/z/rMa8MbYox</a>
I patched the free() implementation itself, not the code that calls free().<p>I did, of course, test it, and anyway we now run into the "freed memory" pattern regularly when debugging (yes including optimized builds), so it's definitely working.
However, if you recast to volatile, the compiler will keep it:<p><pre><code> #include <stdlib.h>
#include <string.h>
void free(void* ptr);
void not_free(void* ptr);
void test_with_free(char* ptr) {
ptr[5] = 6;
void *(* volatile memset_v)(void *s, int c, size_t n) = memset;
memset_v(ptr + 2, 3, 4);
free(ptr);
}
void test_with_other_func(char* ptr) {
ptr[5] = 6;
void *(* volatile memset_v)(void *s, int c, size_t n) = memset;
memset_v(ptr + 2, 3, 4);
not_free(ptr);
}</code></pre>
That code is not guaranteed to work. Declaring memset_v as volatile means that the variable has to be read, but does not imply that the function must be called; the compiler is free to compile the function call as "tmp = memset_v; if (tmp != memset) tmp(...)" relying on its knowledge that in the likely case of equality the call can be optimized away.
Whilst the C standard doesn't guarantee it, both LLVM and GCC _do_. They have implementation-defined that it will work, so are not free to optimise it away.<p>[0] <a href="https://llvm.org/docs/LangRef.html#llvm-memset-intrinsics" rel="nofollow">https://llvm.org/docs/LangRef.html#llvm-memset-intrinsics</a><p>[1] <a href="https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob_plain;f=lib/memset_explicit.c;hb=refs/heads/stable-202301" rel="nofollow">https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=b...</a>
Newer versions of C++ (and C, apparently) have functions so that the cast isn't necessary ( <a href="https://en.cppreference.com/w/c/string/byte/memset.html" rel="nofollow">https://en.cppreference.com/w/c/string/byte/memset.html</a> ).
The author seems to be unaware that Mongo internally develops in a private repo and commits are published later to the public one with <a href="https://github.com/google/copybara" rel="nofollow">https://github.com/google/copybara</a>. All of the confusion around dates is due to this.
The author of this post is incorrect about the timeline. Our Atlas clusters were upgraded days before the CVE was announced.
How often are mongo instances exposed to the internet? I'm more of an SQL person and for those I know it's pretty uncommon, but does happen.
From my experience, Mongo DB's entire raison d'etre is "laziness".<p>* Don't worry about a schema.<p>* Don't worry about persistence or durability.<p>* Don't worry about reads or writes.<p>* Don't worry about connectivity.<p>This is basically the entire philosophy, so it's not surprising at all that users would also not worry about basic security.
To the extent that any of this was ever true, it hasn’t been true for at least a decade. After the WiredTiger acquisition they really got their engineering shit together. You can argue it was several years too late but it did happen.
Not only that, but authentication is much harder than it needs to be to set up (and is off by default).
Although interestingly, for all the mongo deployments I managed, the first time I saw a cluster publicly exposed without SSL was postgres :)
Most of your points are wrong. Maybe only 1- is valid'ish.
Ultimate webscale!
I'm sure there are publicly exposed MySQLs too
A highly cited reason for using mongo is that people would rather not figure out a schema. (N=3/3 for “serious” orgs I know using mongo).<p>That sort of inclination to push off doing the right thing now to save yourself a headache down the line probably overlaps with “let’s just make the db publicly exposed” instead of doing the work of setting up an internal network to save yourself a headache down the line.
Are you guys serious with these takes?<p>You very often have both NoSQL and SQL at scale.<p>NoSQL is used for high availability of data at scale - iMessage famously uses it for message threads, EA famously uses it for gaming matchmaking.<p>What you do is have both SQL and NoSQL. The NoSQL is basically caches of resources for high availability. Imagine you are making a social media app... Yes of course you have a SQL database that stores all the data, but you maintain API caches of posts in NoSQL.<p>Why? This gets to some of your other black vs white insults: NoSQL is typically WAY FASTER than SQL. That's why you use it. It's way faster to read a JSON file from a hard drive than it is to query a SQL database, always has been. So why not use NoSQL for EVERYTHING? Well, because you have duplicated data everywhere since it's not relational, it's just giant caches essentially. You also will get slow queries when the documents get huge.<p>Anyway you need both. It's not an either/or thing. I cannot believe this many years later people do not know the purpose of SQL and NoSQL and do not understand that it is not a competition at all. You want both!
Because nobody uses mongo for the reasons you listed. They use redis, dynamo, scylla or any number of enriched KV stores.<p>Mongo has spent its entire existence pretending to be a SQL database by poorly reinventing
everything you get for free in postgres or mysql or cockroach.
What they wrote was pretty benign. They just asked how common it is for Mongo to be exposed. You seem to have taken that as a completely different statement
I mean they said it's rarely used when in fact it's widely used by some of the world's biggest companies at the highest scale the internet knows. The other guy had a harsher comment sure, maybe I should duplicate my reply to them, but who knows what kinds of rules that breaks on this site lmao Happy Christmas & New Year buddy!
The article links to a shodan scan reporting 213K exposed instances <a href="https://www.shodan.io/search?query=Product%3A%22MongoDB%22" rel="nofollow">https://www.shodan.io/search?query=Product%3A%22MongoDB%22</a>
My university has one exposed to the internet, and it's still not patched. Everyone is on holiday and I have no idea who to contact.
No one, if you aren't in the administration's good graces and something shitty happens unrelated to you, you've put a target on your back to be suspect #1.
"Look at me. I'm the DBA now"<p>-JS devs after "Signing In With Facebook" to MongoDB Atlas<p>AKA me<p>Sorry guys, I broke it
It could be because when you leave an SQL server exposed it often turns into much worse things. For example, without additional configuration, PostgreSQL will default into a configuration that can own the entire host machine. There is probably some obscure feature that allows system process management, uploading a shell script or something else that isn't disabled by default.<p>The end result is "everyone" kind of knows that if you put a PostgreSQL instance up publicly facing without a password or with a weak/default password, it will be popped in minutes and you'll find out about it because the attackers are lazy and just running crypto-mine malware, etc.
For a long time, the default install had it binding to all interfaces and with authentication disabled.
often. lots of data leaks happened because of this. people spin it up in a cloud vm and forget it has a public ip all the time.
I'm still thinking about the hypothetical optimism brought by OWASP top 10 hoping that major flaws will be solved and that buffer overflow has been there since the beginning... in 2003.
Every time someone posts about NoSQL a thousand "programmers" reveal they have never had to support a lot of traffic lol
Why is anyone using mongo for literally anything
Right? When they came out, it was all about NoSQL, which then turned out only mean key-value database, whom are plentiful.
because it is "web scale"<p>ref: <a href="https://www.youtube.com/watch?v=b2F-DItXtZs" rel="nofollow">https://www.youtube.com/watch?v=b2F-DItXtZs</a>
Easy replication. I suppose it's faster than Postgres's JSONB, too.<p>I would rather not use it, but I see that there are legitimate cases where MongoDB or DynamoDB is a technically appropriate choice.
This is a nasty ad repositorium datorum argumentation which I cannot tolerate.
> On Dec 24th, MongoDB reported they have no evidence of anybody exploiting the CVE<p>Absence of evidence is not evidence of absence...
is it true that ubisoft got hacked and 900GB of data from their database was leaked due to mongobleed, i am seeing a lot of posts on social media under the #ubisoft tags today. can someone on HN confirm?
TLDR: Blame logs not NoSQL.<p>Almost always when you hear about emails or payment info leaking (or when Twitter stored passwords in plaintext lol) it's from logs. And a lot of times logs are in NoSQL because it is only ever needed in that same JSON format and in a very highly available way (all you Heroku users tailing logs all day, yw) and then almost nobody encrypts phone numbers and emails etc. whenever those end up in logs.<p>There's basically no security around logs actually. They're just like snapshots of the backend data being sent around and nobody ever cares about it.<p>Anyway it has nothing to do with the choice to use NoSQL, it has more to do with how neglected security is around it.<p>Btw in case you are wondering in both the Twitter plaintext password case and in the Rainbow Six Siege data leak you mention were both logs that leaked. NoSQL backed logs sure, but it's more about the data security around logging IMO.
I read that hack was made possible by Ubisoft’s support staff taking bribes.
Details are still emerging, update in the last hour was that at least 5 different hacking groups were in ubisoft's systems and yeah some might have got their via bribes rather than mongodb <a href="https://x.com/vxunderground/status/2005483271065387461" rel="nofollow">https://x.com/vxunderground/status/2005483271065387461</a>
This has many similarities to the Heartbleed vulnerability: it involves trusting lengths from an attacker, leading to unauthorized revelation of data.
Have all Atlas clusters been auto-updated with a fix?
Related:<p><i>MongoBleed</i><p><a href="https://news.ycombinator.com/item?id=46394620">https://news.ycombinator.com/item?id=46394620</a>
> In C/C++, this doesn’t happen. When you allocate memory via `malloc()`, you get whatever was previously there.<p>What would break if the compiler zero'd it first? Do programs rely on malloc() giving them the data that was there before?
"MongoBleed Explained by an LLM"