A Short Summary of the Last Decades of Data Management
6.4 Key Insight: The history of data management is a recurring cycle of abandoning and rediscovering the same fundamentals—tables, SQL, and ACID—and modern hardware has made distributed systems unnecessary for most workloads.
Hannes Mühleisen traces data management from 5,000-year-old Sumerian clay tablets to modern cloud databases, arguing that tables, SQL, and ACID transactions form an unbreakable 'holy trifecta.' He recounts how the NoSQL era saw companies abandon SQL for MapReduce due to hardware constraints, only to bring it back within years—MongoDB added schemas, Cassandra added ACID, and Databricks restored integrated storage. He predicts relational systems will absorb key-value stores, document databases, time series, vector databases, and graph databases, since all reduce to trivial table representations. Finally, he argues 'big data is dead' because modern single-node hardware is so powerful that distributing computation often makes things slower.
9 If the idea is to say, let's have a separate database because embeddings are so fundamentally different, I need 200 million in VC money to start a new database company... Well, it'…
8 Even if we add 32 computers to this problem, and over 1,000 CPU cores versus the 32 we started with, Spark will still be slower than DuckDB on a single machine.
7 Tables were there before our civilization, and they will be there after our civilization.
DuckDBDatabase Internals