Can Databricks Out-Iceberg the Competition?

Can Databricks Out-Iceberg the Competition?

Analyst(s): Brad Shimmin
Publication Date: April 20, 2026

Databricks has launched the public preview of Apache Iceberg v3 support, integrating technical features such as deletion vectors, row lineage, and the VARIANT data type to achieve performance parity with Delta Lake. This strategic move positions Databricks as a format-agnostic platform, leveraging Unity Catalog to manage Iceberg tables natively while maintaining high-speed execution and cross-engine compatibility.

What is Covered in This Article:

  • Databricks has launched the public preview of Apache Iceberg v3 support, integrating technical features like deletion vectors and row lineage to achieve functional parity with Delta Lake.
  • The introduction of merge-on-read capabilities via deletion vectors significantly reduces write amplification, allowing Iceberg tables to handle high-frequency updates with the same efficiency as native Delta tables.
  • Unity Catalog serves as the foundational governance and metadata layer, enabling native Iceberg v3 management and seamless interoperability with external engines like Trino, Snowflake, and AWS Athena.
  • The inclusion of the VARIANT data type addresses the complexities of semi-structured JSON data, providing a performant and flexible alternative to traditional string-based storage through sub-column pruning.
  • This release strategically positions Databricks as a format-agnostic platform, effectively neutralizing the open-versus-proprietary narrative historically used by competitors to differentiate from the Databricks ecosystem.

The News: Databricks has announced the public preview of Apache Iceberg v3 support within its platform, marking a significant milestone in the evolution of the open lakehouse architecture. This release brings advanced technical capabilities—specifically deletion vectors, row lineage, and the VARIANT data type—directly to Iceberg tables managed within the Databricks environment. By leveraging Databricks Runtime 18.0 and higher, organizations can now create or upgrade existing tables to the Iceberg v3 specification.

A core component of this update is the integration with Unity Catalog, which acts as the universal control plane for these tables. This allows for a unified governance model where Iceberg v3 tables can be read and written by various engines while maintaining fine-grained access control and metadata consistency. Furthermore, this update enhances Databricks Universal Format (UniForm), allowing Delta tables to be accessed as Iceberg v3 tables without data duplication, thereby bridging the functional gap between the two most prominent open table formats in the industry.

Can Databricks Out-Iceberg the Competition?

Analyst Take: The long-standing rivalry between table formats—specifically Delta Lake and Apache Iceberg—has often felt like a technological cold war, with organizations forced to choose sides based on performance or ecosystem compatibility. With the public preview of Apache Iceberg v3, Databricks is essentially calling for a ceasefire, or perhaps more accurately, an adoption of the best parts of the Iceberg ecosystem. By bringing core performance features that were previously the exclusive domain of Delta Lake into the Iceberg fold, Databricks is making a pragmatic statement: in a modern data estate, the underlying file format should not be the bottleneck for innovation.

The strategic importance of this move cannot be overstated. For years, the narrative from competitors suggested that Databricks was an insular environment, while others championed Iceberg as the truly open alternative. By natively supporting Apache Iceberg v3, Databricks effectively neutralizes this argument. No longer just a participant in the open table format debate. The company is positioning itself as the primary execution and governance layer for all of them. This maneuver prioritizes the lakehouse philosophy over format tribalism, a move that will only serve to further solidify Unity Catalog as the gravity well for enterprise data, regardless of the suffix on the metadata file. As outlined in Futurum’s 1H 2026 DIAI Market Sizing & Five-Year Forecast Report, the demand for unified governance is a primary driver for lakehouse adoption, and Databricks is leaning directly into that tailwind.

Closing the Performance Chasm with Deletion Vectors

The most technically significant addition in this release is support for deletion vectors, which facilitates a merge-on-read strategy for Iceberg tables. Historically, Iceberg relied heavily on a copy-on-write mechanism. Every time a single row needed an update or a deletion, the system had to rewrite the entire data file. For large tables with frequent updates—common in Change Data Capture (CDC) scenarios—this created massive write amplification, driving up compute costs and increasing latency.

By implementing deletion vectors, Databricks allows Apache Iceberg v3 to simply mark rows as deleted in a separate bitmap file rather than rewriting the underlying Parquet file immediately. This merge-on-read approach is exactly how Delta Lake achieved its performance lead years ago. Bringing this to Apache Iceberg v3 means that users can expect a significant improvement in write performance for certain workloads. From a technical standpoint, this removes one of the last major performance hurdles preventing Iceberg from being a first-class citizen within the Databricks compute environment. It turns Iceberg from a storage format for cold data into a performant format for active, high-velocity workloads.

Handling the Semi-Structured Chaos with VARIANT

Modern data is rarely neat. It arrives as a chaotic stream of JSON objects, nested arrays, and ever-changing schemas. Dealing with this in a traditional data warehouse or lakehouse environment has historically required one of two unpleasant choices: flattening the data into a rigid schema, which loses flexibility, or storing it as a massive string, which kills query performance. The introduction of the VARIANT type in Apache Iceberg v3 stands as an elegant solution to this dilemma.

The VARIANT type allows Databricks to store semi-structured data in a way that is both flexible and highly optimized for queries. It uses specialized encoding that allows the engine to prune data at the sub-column level, meaning you only read the specific fields within the JSON that your query actually needs. This can shift how practitioners handle the schema-on-read versus schema-on-write trade-off. By standardizing this within Apache Iceberg v3, Databricks ensures that these complex data structures remain interoperable with other engines that also support the VARIANT specification. It is a win for the developer who wants to ingest data quickly and the analyst who needs to query it with minimal latency.

The Lineage of Truth in CDC Pipelines

Data integrity in a lakehouse is only as good as the lineage that tracks it. In complex ETL and CDC pipelines, knowing exactly where a row came from and how it has changed over time is critical for auditing and incremental processing. The addition of row lineage in Apache Iceberg v3 provides a unique identifier for every row, persisting even as data is compacted or moved between files.

Without row lineage, tracking changes at the granular level across various table versions is a metadata nightmare. With it, Databricks can provide more efficient incremental processing. Instead of scanning entire partitions to find changes, the engine can use these stable row IDs to pinpoint exactly what needs to be updated in downstream tables. This reduces the compute tax associated with keeping large-scale data lakes in sync. It also provides the necessary hooks for advanced data governance, allowing teams to trace the lifecycle of a specific record from ingestion to the final consumer.

Unity Catalog as the Universal Control Plane

Perhaps the most critical piece of this puzzle is how Databricks leverages Unity Catalog to manage these Apache Iceberg v3 tables. Not just a store for metadata, Unity Catalog acts as the security and interoperability bridge. By using Unity Catalog, an organization can create an Apache Iceberg v3 table that is governed by the same fine-grained access controls as its Delta tables.

This creates a best-of-both-worlds scenario. Practitioners get the performance of Databricks’ execution engine and the governance of Unity Catalog, but their data remains in a format that a Trino cluster or an external cloud instance can read without any complex translation layers. This is what Databricks refers to as UniForm. By making Apache Iceberg v3 a native target for UniForm, Databricks is making the underlying format essentially irrelevant. If you can read the metadata through Unity Catalog, the physical layout of the bits on S3 or Microsoft ADLS becomes secondary. This is a significant step toward a world where data lock-in is avoided, even if performance optimization remains a competitive differentiator.

The Competitive Counter-Move

For the last two years, many vendors have used Apache Iceberg as a rallying cry against Databricks. They argued that while Delta Lake was open-source, it was still heavily optimized for the Databricks platform and ecosystem. Rivals positioned Iceberg as a truly neutral ground. By embracing Apache Iceberg v3 so fully, Databricks has moved those goalposts.

The competition now has to deal with a version of Databricks that is arguably a more performant Iceberg platform than those who built their entire identity on it. If Databricks can offer superior performance on Iceberg tables via its specialized compute engines, the openness argument becomes secondary to the performance and governance argument. This puts pressure on other major cloud and data warehouse vendors to not only support Iceberg but to match the sophisticated automation (e.g., predictive optimization and liquid clustering) that Databricks brings to the table.

What to Watch:

  • Watch for how well the Apache Iceberg v3 features, particularly the VARIANT type and row lineage, actually translate when accessed by non-Databricks engines. While the specification is open, the implementation details in other community-driven or commercial engines will determine if true interoperability has been achieved.
  • Observe whether existing Delta Lake users begin moving toward Apache Iceberg v3 for specific workloads where external engine access is a priority. If the performance gap is truly closed, we may see a more heterogeneous mix of formats within a single Unity Catalog instance.
  • Keep an eye on how Databricks extends its automated maintenance features—such as predictive optimization and automatic vacuuming—to native Iceberg tables. Currently, many of these automated features are most mature for Delta Lake.
  • Monitor how competitors shift their messaging. Now that Databricks is a top-tier Iceberg supporter, the marketing battle will likely move away from open-versus-closed formats and toward global governance versus siloed catalogs.

See the complete press release on Apache Iceberg v3 support on Databricks’ website.

Disclosure: Futurum is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum as a whole.

Other Insights from Futurum:

Enterprise Data Analytics Survey Finds 59% Investing in Semantic Layers as Critical AI Infrastructure

Oracle Positions AI Database 26ai to Lead $1.2 Trillion Market by Bridging the Agentic Reasoning Gap

Snowflake’s SnowWork Targets the Gap Between Data Insight and Business Action

Author Information

Brad Shimmin

Brad Shimmin is Vice President and Practice Lead, Data Intelligence, Analytics, & Infrastructure at Futurum. He provides strategic direction and market analysis to help organizations maximize their investments in data and analytics. Currently, Brad is focused on helping companies establish an AI-first data strategy.

With over 30 years of experience in enterprise IT and emerging technologies, Brad is a distinguished thought leader specializing in data, analytics, artificial intelligence, and enterprise software development. Consulting with Fortune 100 vendors, Brad specializes in industry thought leadership, worldwide market analysis, client development, and strategic advisory services.

Brad earned his Bachelor of Arts from Utah State University, where he graduated Magna Cum Laude. Brad lives in Longmeadow, MA, with his beautiful wife and far too many LEGO sets.

Related Insights
How Big A Role Will Commvault Play In Securing Agentic AI?
April 17, 2026

How Big A Role Will Commvault Play In Securing Agentic AI?

Fernando Montenegro and Brad Shimmin, VPs at Futurum, analyze Commvault's new offerings—Data Activate, AI Protect, and AI Studio—and their strategic role in securing enterprise agentic AI ecosystems against rising competition....
Can Starburst's AIDA Crack the Enterprise AI Data Access Problem?
April 17, 2026

Can Starburst’s AIDA Crack the Enterprise AI Data Access Problem?

Starburst's AIDA represents a fundamental shift in how enterprises approach AI data access. Rather than centralizing data, agentic AI systems reason across distributed sources, addressing accuracy concerns and accelerating AI...
ClickHouse Builds a CLI to Make its Databases Agent-Native
April 15, 2026

ClickHouse Builds a CLI to Make its Databases Agent-Native

Brad Shimmin, VP and Practice Lead at Futurum, explores Cloudera’s latest hybrid data platform updates. He analyzes how the Hybrid Multi-Cloud Fabric and Iceberg REST Catalog are designed to reduce...
Neo4j's Context Gap
April 14, 2026

Does Neo4j’s Context Gap Thesis Expose Enterprise AI’s Biggest Blind Spot?

Neo4j's latest analysis exposes a critical flaw in enterprise AI: the neglect of structural, relational context. Discover why graph databases are positioned as the missing memory layer for agentic AI...
Wasabi Acquires Lyve Cloud. Does This Strengthen Its Storage Position?
April 14, 2026

Wasabi Acquires Lyve Cloud. Does This Strengthen Its Storage Position?

Alex Smith and Brad Shimmin from Futurum examine the Wasabi Lyve Cloud acquisition and whether the deal meaningfully shifts cloud storage dynamics or primarily consolidates customers, capacity, and ecosystem integrations....
Oracle Redefines Mission-Critical Tiers as AI Workloads Demand Always-On Data
April 14, 2026

Oracle Redefines Mission-Critical Tiers as AI Workloads Demand Always-On Data

Brad Shimmin, Research Director at Futurum, explores Oracle's new Diamond-Grade resilience and how sub-millisecond latency and post-quantum security are redefining mission-critical data for the age of agentic AI....

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.