So DuckDB was developed to allow queries for bigish data finally without the need for a cluster to simplify data analysis... and we now put it to a cluster?<p>I think there are solutions for that scale of data already, and simplicity is the best feature of DuckDB (at lest for me).
In my experience ray clusters don't scale well and end up costing you more money. You need to run permanent per-user instances etc.<p>What you need is a multi-tenancy shared infrastructure that is elastic.
neat. i'm pretty novice in the guts of this kind of stuff, but how does this work under the hood for blocking operators where they "cannot output a single row until the last row of their input has been seen"?<p>i think this is where spark shuffling comes in? but how does it work here.<p><a href="https://duckdb.org/docs/stable/guides/performance/how_to_tune_workloads#blocking-operators" rel="nofollow">https://duckdb.org/docs/stable/guides/performance/how_to_tun...</a>
feels like a missed opportunity to call it cluster-quack xD