DataPrincipal Daily - June 8th, 2026

Iceberg v4 design work, vendor interoperability claims, and engine-level write support all converged on the same metadata layer this week.

Jun 08, 2026

⚡ TL;DR

Iceberg Summit 2026 surfaced four concrete v4 proposals, led by one-file commits for low-latency writes.
Confluent moves Snapshot Queries to GA, joining Tableflow Iceberg history with live Kafka events.
DuckDB v1.5.3 adds MERGE INTO, ALTER TABLE, and v3 deletion vectors to its Iceberg extension.
Snowflake shipped Iceberg v3 GA while Databricks took v3 to public preview, with competing catalog interoperability claims.

🏗️ Platforms & Architecture

Iceberg v4 design centers on one-file commits and column families

Iceberg Summit 2026 (San Francisco, 600+ attendees), recapped by Snowflake on June 4, produced four named v4 directions: adaptive metadata trees enabling one-file commits for low-latency writes, relative paths for table portability across replication and migration, column families that let column groups evolve independently for wide ML feature tables, and extensible column statistics opening the door to ANN search indexes. The one-file-commit work is being pushed by Russell Spitzer, Amogh Jahagirdar, Yi Fang, and Steven Wu on the dev list, with Ryan Blue active on the adaptive metadata tree structure. None of these are merged into the spec yet. v4 is being shaped around streaming write patterns and AI retrieval, not just batch analytics.

Snowflake ships Iceberg v3 GA with Horizon Catalog bidirectional writes

At Summit (June 2), Snowflake made Iceberg v3 generally available alongside Snowflake-managed storage for Iceberg tables, and opened Horizon Catalog (powered by Apache Polaris) to bidirectional read and write from external engines like Spark and Trino. Governance now extends to open tables through external engine access management and support for the Iceberg REST Scan Plan API for fine-grained protections across engines. Affirm, Indeed, NTT DOCOMO, and Samsung Ads are named adopters.

Databricks counters with Managed, Foreign, and v3 Iceberg public preview in Unity Catalog

Databricks moved Iceberg v3 to public preview on Databricks Runtime 18.0+ with row lineage, deletion vectors (cited up to 10x faster than copy-on-write), and the VARIANT type, ahead of its own Data + AI Summit (June 15-18). Unity Catalog now positions Managed Iceberg with Predictive Optimization and Liquid Clustering, plus Foreign Iceberg for externally managed tables. Snowflake’s v3 is now GA and Databricks’ is in public preview, and both vendors claim the most interoperable Iceberg catalog within two weeks of each other.

🔧 Tools & Products

Confluent moves Snapshot Queries to GA, unifying batch history and live streams

Confluent Cloud’s Q2 ‘26 launch brings Snapshot Queries to GA in June, combining historical Tableflow data (Iceberg or Parquet) with live Kafka events in a single Flink SQL query, cited at 50x to 100x faster than scanning raw streams. The release also ships a free open-source dbt adapter for Flink so engineers can define streaming pipelines as dbt models, plus GA for materialized tables, Tableflow user-defined namespaces, and JSON-schema dead letter queues. Pricing reaches as low as $0.01 per topic-hour. Treating streaming history as a queryable table closes the gap that previously forced a separate batch warehouse.

DuckDB v1.5.3 adds MERGE INTO and v3 deletion vectors to its Iceberg extension

The Iceberg extension in DuckDB v1.5.3 (May 29, authors Tom Ebergen and Thijs Bruineman) now supports MERGE INTO for row-level upserts and deletes, ALTER TABLE as metadata-only schema evolution, and bucket and truncate partition transforms. Iceberg v3 support arrives with VARIANT and TIMESTAMP_NS types, schema-level default values, and binary deletion vectors stored in Puffin files rather than Parquet. A single-node engine writing transactional Iceberg tables narrows the case for spinning up a cluster for routine maintenance jobs.

ClickHouse Open House ships managed Postgres, ClickStack Cloud, and 26.5

ClickHouse’s Open House week (late May into June) launched managed Postgres in public beta with native CDC into ClickHouse (early benchmarks cited at over 5x the TPS of AWS RDS), serverless ClickStack Cloud observability in private preview, executable Python UDFs callable from SQL at query speed, and ClickPipes GCP-native support to cut cross-cloud egress. Release 26.5 landed in early June, followed by a fast-joins engineering writeup on June 3.

📐 Practices & Governance

BigQuery’s serverless Iceberg REST catalog opens two-way cross-engine writes

Google’s managed Iceberg REST catalog (BigLake renamed to Lakehouse for Apache Iceberg as of April 20) is in preview for bidirectional read and write between BigQuery and Spark, Flink, and Trino using native Iceberg libraries, with uniform governance through credential vending. Managed table maintenance offloads compaction and garbage collection, cited at a ~40% improvement on TPC-DS 10T, while the Advanced Runtime reports 2x faster scans than self-managed based on internal benchmarking. All three hyperscalers now expose a managed REST catalog, which puts the catalog API, not the storage layer, at the place where governance portability gets tested.

💎 Gems & Tools

DuckDB Quack protocol DuckDB’s client-server protocol (introduced May 12) lets a single DuckDB instance serve remote attachments and query orchestration. It extends the embedded engine into a lightweight server topology without a separate database product.

dbt adapter for Apache Flink A free, open-source plugin to define, test, and document Flink streaming pipelines as dbt models, bringing batch transformation ergonomics to streaming SQL.

Lance lakehouse format in DuckDB A May 21 DuckDB engineering post test-drives the Lance columnar format (built for ML and vector workloads) through DuckDB, a useful read for teams weighing storage formats beyond Parquet and Iceberg for retrieval pipelines.

Data Principal

Discussion about this post

Ready for more?