APRIL 14, 2026

12 MIN READ

SYLVAIN UTARD

Lakehouse table formats in 2026

Lakehouse table formats in 2026

There is no single “winning” lakehouse table format in 2026. What has emerged instead is a more interesting split.

Blog

There is no single “winning” lakehouse table format in 2026.

What has emerged instead is a more interesting split. There are open table formats like Iceberg, Delta Lake, and Hudi, the original three efforts from Netflix, Databricks, and Uber to solve roughly the same problem by keeping both data and table metadata in object storage through files, logs, and snapshots. There are managed control planes like Amazon S3 Tables and Cloudflare’s R2 Data Catalog, which package those open formats with hosted maintenance. There are metadata-native challengers like DuckLake, which keep the data as Parquet in object storage but move table metadata into a relational database instead. And there are AI-era-adjacent formats like Lance and Vortex, which push the storage layer toward faster random access, indexes, and more AI-friendly physical layouts.

That distinction matters because most teams do not actually need “a format.” They need a system that can read the data they already have, keep costs and maintenance under control, and support both human analysis and always-on AI workloads. At Altertable, that is exactly how we look at the space. We believe the winning platform in 2026 needs to read across the formats customers already use, not force them into a single ideology. That is also why we are watching what the incubating Apache XTable project is doing around cross-format interoperability.

What a lakehouse table format actually does

A file format like Parquet tells you how bytes are laid out. A table format adds the layer that an object store does not provide on its own: which files belong to the current snapshot of a table, how new snapshots are committed atomically, how schemas evolve, and how multiple engines can read consistent point-in-time views of the same data over time. In a traditional database, the storage engine owns this responsibility directly. But in a lakehouse, storage and compute are decoupled, so the table format has to play that role through a shared spec. Iceberg does this through immutable data files and immutable metadata snapshots. Delta does it through a transaction log with snapshot isolation and serializable writes. DuckLake takes a different path and says: instead of storing table metadata as a separate set of files inside the object store, it puts that metadata directly into a relational database.

That is why the real argument in 2026 is not just about format syntax. It is about where complexity goes. In object-store-native systems, someone still has to handle commits, metadata lookups, compaction, cleanup, and access control. The market is increasingly split between teams that want to run that control plane themselves and teams that want someone else to make it boring.

The short version

If you want the compressed view:

  • Iceberg is the default open standard for multi-engine analytics.
  • Delta Lake is still the strongest Spark-first, batteries-included option.
  • S3 Tables is managed Iceberg, not a new table model.
  • Cloudflare R2 Data Catalog + Pipelines is another managed-Iceberg path, with a lighter object-store-native feel.
  • DuckLake is the clearest challenge to file-heavy metadata architectures.
  • Lance is the strongest sign that AI workloads want more than scan-and-prune tables.
  • Vortex is promising, but it is better understood as a next-generation file format than as a complete lakehouse table format.

Apache Iceberg

Iceberg is still the default answer for most analytical lakehouse workloads in 2026.

Its biggest strength is not that it is perfect. It is that it has become the closest thing the ecosystem has to a shared language. Its metadata model was built for multiple engines, and the REST Catalog pattern has become a practical interoperability layer across both self-managed and hosted systems. That is a big part of why Iceberg increasingly feels like the lingua franca of the open lakehouse world.

The appeal is easy to understand. Iceberg gives teams open storage, schema evolution, partition evolution, time travel, and broad engine compatibility. If your priority is multi-engine analytics on object storage, Iceberg is still the most pragmatic starting point.

Its weakness is just as clear: operational overhead. Iceberg does not remove the need to think about metadata growth, compaction, snapshot expiration, small files, and catalog design. It gives you an open table model, but not a free pass on maintenance. In other words, Iceberg is a strong spec, but it is not the whole product.

Delta Lake

Delta Lake remains a very strong option, especially for teams that are already Spark-first.

Its core story is familiar: ACID semantics on object storage, a robust transaction log, and mature support for row-level operations and streaming-oriented workflows. If Iceberg is the ecosystem’s broad open default, Delta is still the most opinionated and feature-rich batteries-included choice in the Spark-shaped universe.

The practical advantage of Delta is that a lot of lakehouse teams already know how to use it, and it remains strong for merges, updates, upserts, and streaming-heavy data engineering patterns. The practical downside is that the best experience has historically been tighter around the Spark and Databricks ecosystem. Interoperability has improved, but Iceberg still has a stronger reputation as the more engine-neutral default.

So Delta is not fading. It is just more clearly a great fit for a specific operating model rather than the universal default.

Amazon S3 Tables

S3 Tables is best understood as managed Iceberg on AWS.

That is the most useful framing because it avoids overcomplicating the story. AWS is not introducing a fundamentally different table model here. It is taking Iceberg and making more of the maintenance operationally invisible: compaction, snapshot management, credentials, and integration with the surrounding AWS stack.

That is valuable. A lot of teams do not want to run table maintenance pipelines. They want open-table semantics, but they want the painful parts to become somebody else’s job. S3 Tables is a clean answer to that.

But the right critique is not “this is AWS-specific.” The better critique is: it does not change what Iceberg is. It productizes it. You still inherit the same underlying tradeoffs around metadata, file layout, and object-store-oriented maintenance. AWS is reducing the operational burden, not replacing the architectural model. That raises a fair question for AWS-heavy teams: if you are already comfortable with AWS-managed analytics, do you want managed open tables on object storage, or would a warehouse like Redshift actually be the simpler fit for your workload?

Cloudflare R2 Data Catalog + Pipelines

Cloudflare’s data platform is one of the more interesting entries in this market because it is not really trying to invent a new format. It is trying to make managed Iceberg feel lighter and closer to ingestion.

The core offer is straightforward: R2 Data Catalog gives you a managed Iceberg catalog in R2, and Pipelines can land transformed data into R2 either as Iceberg-backed Parquet tables or as raw Parquet and JSON files. In other words, it is another version of “Iceberg plus hosted control plane,” but with a very object-store-native distribution model.

That makes it attractive for teams that want a direct path from incoming data to queryable analytical tables without building and operating every part themselves. The appeal is not really about format novelty. It is about reducing friction around ingestion, metadata, and maintenance.

The caveat is maturity. What we see from early adopters and vendor messaging is consistent: Cloudflare’s Iceberg story is promising, but still early enough that you should expect evolving limits and some rough edges. That matters if you are choosing a control plane, because the control plane is the whole point.

DuckLake

DuckLake is the most interesting format in this list if your real frustration with lakehouses is metadata complexity.

Its idea is almost embarrassingly simple: store the data as Parquet, but put all the metadata into a transactional SQL database. DuckLake’s argument is that lakehouse stacks already end up depending on a catalog or database somewhere, so the file-based metadata maze of many current systems is unnecessary complexity. Use SQL transactions, primary keys, relational constraints, and regular tables to manage metadata directly.

That changes the feel of the system a lot. Planning can become a small number of direct SQL reads instead of a chain of object-store metadata fetches and reconstruction work. Cross-table metadata operations become ordinary transactional database operations. And the mental model gets simpler: open Parquet data on one side, transactional metadata in SQL on the other.

DuckLake is still early, and that matters. The ecosystem is much smaller than Iceberg’s, and today it is most naturally associated with DuckDB-centric workflows. It also makes an explicit trade: it swaps the metadata-file maze for a SQL database dependency. For teams that already run SQL systems comfortably, that can be a huge simplification. For teams whose lakehouse migration was partly about “fewer databases to run,” that trade may feel backward.

For Altertable, though, this is not an abstract debate. We chose DuckLake as the foundation of our default managed and hosted database because AI workloads need fast access to metadata before they ever read the data itself. Models and agents need to plan, retrieve context, and reason about schema and table state quickly, and DuckLake's relational metadata layer is a much better fit for that than a maze of metadata files. AI is not a feature we bolted onto a generic lakehouse after the fact.

Lance

Lance matters because it points to a different future.

It is not trying to be the most boring, universal analytics table format. It is explicitly designed around random access, multimodal data, built-in indexing, and AI-native retrieval patterns. If Iceberg and Delta are fundamentally scan-and-prune systems, Lance is the clearest signal that AI workloads want more than that.

That makes Lance attractive for feature stores, multimodal datasets, embeddings, vector search, and other workloads where the dominant question is not “how fast can I plan a big analytical scan?” but “how fast can I retrieve the right blobs, vectors, and metadata with minimal I/O?”

The tradeoff is equally clear. Lance is newer, its ecosystem is narrower, and it is not yet the broad default exchange format for general-purpose analytics. That is fine. It is solving a different problem. The important point is that its existence tells us something meaningful: the next era of data infrastructure will not be satisfied with Parquet plus a generic table layer alone.

Vortex

Vortex is worth mentioning, but carefully.

It sits adjacent to the table-format conversation rather than directly inside it. The reason it matters is that it attacks the physical file format layer itself. If some workloads are starting to expose Parquet’s limits, especially around random access and more interactive execution patterns, then formats like Vortex are a sign that the underlying storage representation may also evolve.

That said, it is better to frame Vortex as a promising next-generation columnar format than as a one-for-one competitor to Iceberg, Delta, or DuckLake. It does not solve the full control-plane problem by itself. It belongs in this conversation because the future lakehouse stack may end up rethinking both the table layer and the physical file layer together.

So which one should you choose?

The fastest way to think about the space in 2026 is by workload shape and operational appetite.

If you want the broadest open default for multi-engine analytics, start with Iceberg. If you are already deep in Spark-first workflows and want a mature batteries-included model, Delta Lake remains strong. If your real problem is that you do not want to run compaction, cleanup, and catalog operations yourself, then S3 Tables or Cloudflare’s managed Iceberg path are the more direct answers. If your core complaint is that lakehouse metadata stacks have become absurdly indirect, then DuckLake is the clearest conceptual reset. And if your workload is truly AI-native, with vectors, blobs, and fast indexed retrieval, then Lance is the format in this set that most clearly starts from that reality.

Where Altertable fits

Our view is simple: customers should not have to redesign their entire stack around one format winner.

Formats will coexist because they solve different problems. Iceberg is winning as the open analytical default. Delta remains strong in its world. Managed Iceberg services are absorbing operational pain. DuckLake is rethinking metadata architecture. Lance is pulling the ecosystem toward AI-native access patterns. Vortex points to future changes lower in the stack.

That is why we think the winning product is not “the one true format.” It is the hosted, managed, AI-ready layer that can read across these formats and make them useful.

At Altertable, DuckLake anchors our default managed database for the reasons we spell out in the DuckLake section. We also believe customers should be able to work with the data they already have. That means reading across the open table ecosystems that matter, not forcing teams to pick sides. In practice, that means a platform that can combine hosted performance and managed operations with support for the table formats customers are already adopting, while giving both humans and AI agents a shared place to query, understand, and act on the data.

Our take

The real question in 2026 is not “which lakehouse table format wins?”

A better question is:

Which platform can read your data from any format, reduce the control-plane burden, and turn those tables into something active?

That is the shift from reactive analytics to continuous intelligence.

And that is the future we are building toward at Altertable.

Share

Sylvain Utard, Co-Founder & CEO at Altertable

Sylvain Utard

Co-Founder & CEO

Seasoned leader in B2B SaaS and B2C. Scaled 100+ teams at Algolia (1st hire) & Sorare. Passionate about data, performance and productivity.

Stay Updated

Get the latest insights on data, AI, and modern infrastructure delivered to your inbox

For more information, please consult our Privacy Policy

Related Articles

Continue exploring topics related to this article

The Data Stack Is Broken
MAY 20TH, 2025
Sylvain Utard

The Data Stack Is Broken

Data Stack, Architecture, Product

Most data sits idle—trapped behind complexity, bloated budgets, and brittle tooling. The modern data stack promised agility but delivered a slow, siloed maze.

READ ARTICLE
Rethinking the Lakehouse
JULY 30TH, 2025
Yannick Utard

Rethinking the Lakehouse

Architecture, Performance, Data Stack

Breaking down our storage and query architecture: why we're leaning into Apache Iceberg and why DuckDB is emerging as our real-time query engine of choice.

READ ARTICLE
Grep your lakehouse
MARCH 27TH, 2026
Sylvain Utard

Grep your lakehouse

Product, Performance, Engineering

Agents do not fail because they lack SQL generation. They fail because they lack a native way to retrieve the right slice of data before writing precise queries.

READ ARTICLE
AI's Event Backbone
MARCH 10TH, 2026
Sylvain Utard

AI's Event Backbone

Product, Performance, Engineering

AI-native products generate a new kind of infrastructure problem. Here's how to build the event backbone for your AI system.

READ ARTICLE
One Billion Rows
FEBRUARY 17TH, 2026
Sylvain Utard

One Billion Rows

Product, Performance, Engineering

At 1 billion rows, every shortcut comes back to collect interest. Here's how we achieved sub-second queries with near-realtime ingestion.

READ ARTICLE
Altertable Logo

Build on a lakehouse your agents can use

Join engineering, product, and data teams replacing warehouse sprawl with a faster, more affordable operational data platform.