Rather than answering directly, I'm thinking about this problem from the other end altogether ever since I saw the dbricks rt demo. Apologies for the rambling response, as I haven't yet finished thinking about this problem...<p>We ended up with 'hot' data in oltp and 'cold/archival' data in olap because the storage size of oltp has always been limited.<p>(1) Limited by computation - there's only so much data that we can store on disks and nvme<p>(2) Limited by wallet - disks and nvme are EXPENSIVE<p>Also, the tight coupling of compute and data didn't help. It limited the size of databases on the individual expensive compute nodes.<p>So, another question will be -<p>What's currently stopping me from keeping the scd history tables right in my oltp db? what's forcing me to copy state into my etl/elt pipeline and the process it into scd into a dedicated olap db?<p>To some extent,the answer is still the same - the oltp cannot scale for the storage size required for keeping historical data. So, I've had to take out the 'cold' historical data and keep it in my olap freezer.<p>Now, if oltp itself is scaling, I'm not gonna bother with the copying step. I'll just prefer to store the history in oltp itself.<p>In my perspective (majorly from handling IoT systems), I need olap for 2 reasons - (1) storage scalability, and (2) analytical processing speed<p>I now consider (1) to be a solved problem<p>As for (2), I'm still not sure how this architecture ends up matching the query processing speeds of column-oriented storages. But again, I need to study more.<p>The SCD pipeline still remains in some form. Either in the form of (1) scd rows that we currently keep (etl pipeline)
, or (2) as older lsn rows that simply don't get deleted (existing db engine).<p>I've done quite a lot of experimentation with (2), and it is a pretty solid concept to work with.<p>I've spent quite a lot of years hammering my brain at databases and datastores in general. And I've now got a feeling that this is it.
Finally.