> But SQL schemas often look like this. Columns are nullable by default, and wide tables are common.<p>Hard disagree. That database table was a waving red flag. I don't know enough/any rust so don't really understand the rest of the article but I have never in my life worked with a database table that had 700 columns. Or even 100.
I saw tables with more than a thousand columns. It was a law firm home-grown FileMaker tool. Didn't inspect it too closely, so don't know what was inside<p>I remember a phrase from one of C. J. Date's books: every record is a logical statement. It really stood out for me and I keep returning to it. Such an understanding implies a rather small number of fields or the logical complexity will go through the roof.
<a href="https://jimmyhmiller.com/ugliest-beautiful-codebase" rel="nofollow">https://jimmyhmiller.com/ugliest-beautiful-codebase</a>
I kinda love this. That sounds like an incredibly entertaining place to work for between 1 and 2 years in your late 20s and not a second longer.
With AI "programmers", this will be the future: bugs galore and the things that do work, work by accident.<p>I think this company was ahead of the curve.
This is awesome. Got completely lost reading this and was struggling to figure out where I got this link from. Amazing story.
This is like the functional ugly tier of buildings from "how buildings learn". Excellent stuff
The blog post is an entertaining read, but I was left with the impression the author might have tried do embellish, particularly in it's disbelief angle.<p>Take this passage:<p>> The app relied on a SOAP service, not to do any servicey things. No, the service was a pure function. It was the client that did all the side effects. In that client, I discovered a massive class hierarchy. 120 classes each with various methods, inheritance going 10 levels deep. The only problem? ALL THE METHODS WERE EMPTY. I do not exaggerate here. Not mostly empty. Empty.<p>> That one stumped me for a while. Eventually, I learned this was in service of building a structure he could then use reflection on. That reflection would let him create a pipe-delimited string (whose structure was completely database-driven, but entirely static) that he would send over a socket.<p>Classes with empty methods? Used reflection to create a pipe-delimited string? The string was sent over the wire?<p>Why congratulations, you just rediscovered data transfer objects, specifically API models.
Hi, I'm the author of the article.<p>As to your hard disagree, I guess it depends... While this particular user is on the higher end (in terms of columns), it's not our only user where column counts are huge. We see tables with 100+ columns on a fairly regular basis especially when dealing with larger enterprises.
Can you clarify which knowledge domains those enterprises fall under with examples of what problems they were trying to solve?<p>If it's not obvious, I agree with the hard disagree. Every time I see a table with that many columns, I have a hard time believing there isn't some normalization possible.<p>Schemas that stubbornly stick to high-level concepts and refuse to dig into the subfeatures of the data are often seen from inexperienced devs or dysfunctional/disorganized places too inflexible to care much. This isn't really negotiable. There will be issues with such a schema if it's meant to scale up or be migrated or maintained long term.
Normalization is possible but not practical in a lot of cases: nearly every “legacy” database I’ve seen has at least one table that just accumulates columns because that was the quickest way to ship something.<p>Also, normalization solves a problem that’s present in OLTP applications: OLAP/Big Data applications generally have problems that are solved by denormalization.
Yep, this comment sums it up well.<p>We have many large enterprises from wildly different domains use feldera and from what I can tell there is no correlation between the domain and the amount of columns.
As fiddlerwoaroof says, it seems to be more a function of how mature/big the company is and how much time it had to 'accumulate things' in their data model.
And there might be very good reasons to design things the way they did, it's very hard to question it without being a domain expert in their field, I wouldn't dare :).
> I can tell there is no correlation between the domain and the amount of columns.<p>This is unbelievable. In purely architectural terms that would require your database design to be an amorphous big ball of everything, with no discernible design or modelling involved. This is completely unrealistic. Are queries done at random?<p>In practical terms, your assertion is irrelevant. Look at the sparse columns. Figure out those with sparse rows.
Then move half of the columns to a new table and keep the other half in the original table. Congratulations, you just cut down your column count by half, and sped up your queries.<p>Even better: discover how your data is being used. Look at queries and check what fields are used in each case. Odds are, that's your table right there.<p>Let's face it. There is absolutely no technical or architectural reason to reach this point. This problem is really not about structs.
Feldera speak from <i>lived experience</i> when they say 100+ column tables are common <i>in their customer base</i>. They speak from lived experience when they say there's no correlation <i>in their customer base</i>.<p>Feldera provides a service. They did not design these schemas. Their customers did, and probably over such long time periods that those schemas cannot be referred to as <i>designed</i> anymore -- they just <i>happened</i>.<p>IIUC Feldera works in OLAP primarily, where I have no trouble believing these schemas are common. At my $JOB they are, because it works well for the type of data we process. Some OLAP DBs might not even support JOINs.<p>Feldera folks are simply reporting on their experience, and people are saying they're... <i>wrong</i>?
Haha, looks like it.<p>I remember the first time I encountered this thing called TPC-H back when I was a student. I thought "wow surely SQL can't get more complicated than that".<p>Turns out I was <i>very wrong</i> about that. So it's all about perspective.<p>We wrote another blog post about this topic a while ago; I find it much more impressive because this is about the actual queries some people are running: <a href="https://www.feldera.com/blog/can-your-incremental-compute-engine-do-this" rel="nofollow">https://www.feldera.com/blog/can-your-incremental-compute-en...</a>
> Normalization is possible but not practical in a lot of cases: nearly every “legacy” database I’ve seen has at least one table that just accumulates columns because that was the quickest way to ship something.<p>Strong disagree. I'll explain.<p>Your argument would support the idea of adding a few columns to a table to get to a short time to market. That's ok.<p>Your comment does not come close to justify why you would keep the columns in. Not the slightest.<p>Tables with many columns create all sorts of problems and inefficiencies. Over fetching is a problem all on itself. Even the code gets brittle, where each and every single tweak risks beijg a major regression.<p>Creating a new table is not hard. Add a foreign key, add the columns, do a standard parallel write migration. Done. How on earth is this not practical?
There are sometimes reasons this is harder in practice, for example let’s say the business or even third parties have access to this db directly and have hundreds of separate apps/services relying on this db (also an anti-pattern of course but not uncommon), that makes changing the db significantly harder.<p>Mistakes made early on and not corrected can snowball and lead to this kind of mess, which is very hard to back out of.
I think you believe the average developer, especially on enterprise software where you see this sort of shit, is far more competent or ambitious than they actually are. Many would be horrified to see the number of monkeys banging out nasty DDL in Hibernate or whatever C# uses that have no idea what "normal forms" or "relational algebra" are and are actively resistant to even attempting to learn.
If lots of columns are a red flag then red flags are quite common in many businesses. I’ve seen tables with tens of thousands of columns. Naturally those are not used by humans writing sql by hand, but there are many tools that have crazy data layouts and generate crazy sql to work with it.
It might not be common in typical software shops. I work in manufacturing and our database has multiple tables with hundreds of columns.
No idea what these guys do exactly but their tagline says "Feldera's award-winning incremental compute engine runs SQL pipelines of any complexity"<p>So it sounds like helping customers with databases full of red flags is their bread and butter
> it sounds like helping customers with databases full of red flags is their bread and butter<p>Yes that captures it well. Feldera is an incremental query engine. Loosely speaking: it computes answers to any of your SQL queries by doing work proportional to the incoming changes for your data (rather than the entire state of your database tables).<p>If you have queries that take hours to compute in a traditional database like Spark/PostgreSQL/Snowflake (because of their complexity, or data size) and you want to always have the most up-to-date answer for your queries, feldera will give you that answer 'instantly' whenever your data changes (after you've back-filled your existing dataset into it).<p>There is some more information about how it works under the hood here: <a href="https://docs.feldera.com/literature/papers" rel="nofollow">https://docs.feldera.com/literature/papers</a>
Some businesses are genuinely this complicated. Splitting those facts into additional tables isn't going to help very much unless it actually mirrors the shape of the business. If it doesn't align, you are forcing a lot of downstream joins for no good reason.
> I have never in my life worked with a database table that had 700 columns<p>Main table at work is about 600, though I suspect only 300-400 are actively used these days. A lot come from name and address fields, we have about 10 sets of those in the main table, and around 14 fields per.<p>Back when this was created some 20+ years ago it was faster and easier to have it all in one row rather than to do 20+ joins.<p>We probably would segment it a bit more if we did it from scratch, but only some.
OLTP tables typically are normalized.<p>But OLAP tables (data lake/warehouse stuff), for speed purposes, are intentionally denormalized and yes, you can have 100+ columns of nullable stuff.
It is very common to find tables with 1000+ columns in machine learning training sets at e-commerce companies. The largest I have seen had over 10000 columns.
It's OLAP, it very common for analytical tables to be denormalized. As an example, each UserAction row can include every field from Device and User to maximize the speed at which fraud detection works. You might even want to store multiple Devices in a single row: current, common 1, 2 and 3.
<a href="https://apps.naaccr.org/data-dictionary/data-dictionary/version=26/chapter-view/data-descriptor-table/" rel="nofollow">https://apps.naaccr.org/data-dictionary/data-dictionary/vers...</a><p>771 columns (and I've read the definitions for them all, plus about 50 more that have been retired). In the database, these are split across at least 3 tables (registry, patient, tumor). But when working with the records, it's common to use one joined table. Luckily, even that usually fits in RAM.
Not everyone understands normal form, much less 3rd normal form. I’ve seen people do worse with excel files where they ran out of columns and had to link across spreadsheets.
Salesforce by default comes with some where your tables have 50 columns before you start tweaking anything.<p>100s is not unusual. Thousands happens before you realise.
That statement jumped out at me as well. I've worked as a DBA on tons of databases backing a wide variety of ERPs, web apps, analytics, data warehouses...700 columns?!? No.
I have seen tables (SQL and parquet, too) that have at least high hundreds of optional columns, but this was always understood to be a terrible hack, in those cases.
> Hard disagree. That database table was a waving red flag.<p>Exactly this.<p>This article is not about structs or Rust. This article is about poor design of the whole persistence layer. I mean, hundreds of columns? Almost all of them optional? This is the kind of design that gets candidates to junior engineer positions kicked off a hiring round.<p>Nobody gets fired for using a struct? If it's an organization that tolerates database tables with nearly 1k optional rows then that comes at no surprise.
[dead]