DataPrincipal Daily - June 10th, 2026
Agentic analytics, a Rust rewrite of dbt, and the metadata tier under object-storage Kafka all moved this week, while the table-format vendors stayed quiet.
⚡ TL;DR
ClickHouse Agents reached public beta on June 9, a managed agentic analytics service in ClickHouse Cloud running on Anthropic’s Claude.
dbt Core 2.0 landed in alpha on June 1, rebuilt in Rust on the Fusion engine and re-licensed under Apache 2.0.
Redpanda published the internals of its Cloud Topics metastore, the LSM-tree metadata tier behind object-storage-backed Kafka.
Cloudera adopted Apache Polaris on June 4 and contributed an Apache Ranger external authorizer to Apache Polaris 1.5.
🏗️ Platforms & Architecture
ClickHouse Agents enters public beta on Claude
ClickHouse shipped the public beta of ClickHouse Agents on June 9, a managed agentic analytics service inside ClickHouse Cloud that runs on Anthropic’s Claude, with Sonnet and Haiku wired in as the default models. The no-code builder is built on LibreChat, the open-source project ClickHouse acquired in November 2025, and adds a managed chat interface, a sandboxed code interpreter for Bash, Python, and JavaScript, skills, memories, artifacts, multi-agent coordination, SSO, and encrypted storage. It connects natively to ClickHouse and to any Model Context Protocol system, and ships a native AWS AgentCore Registry integration. ClickHouse says its internal version is used daily by around 80 percent of employees, processes more than 45 million tokens a day, and now answers roughly 70 percent of internal warehouse queries, with agents firing 10x to 100x the query volume a human analyst would. Putting an agent runtime directly inside the warehouse, rather than selling a separate gateway, is an interesting architectural and product decision, given the horror stories about LLM agents losing/deleting data. I believe ClickHouse takes certain safety measures.
Redpanda opens up its Cloud Topics metastore
Redpanda engineer Andrew Wong detailed the metastore on June 9, the metadata tier behind Cloud Topics, the object-storage write path introduced in Redpanda 26.1. The metastore is an internal key-value store that maps Kafka offsets to positions inside L1 objects in object storage, and it also tracks leader terms, compaction state, and protocol information. It is built as an LSM tree drawing on LevelDB and RocksDB, with the write-ahead log implemented as the cluster’s Raft log, SSTables held in object storage behind a write-through local cache, and the manifest replicated to all replicas through Raft and flushed every ten minutes by default. Publishing the mechanism behind offset lookups against long-term storage, full-cluster restoration, and cross-region read replicas is how Redpanda defends its cost claim for storing Kafka data without local disks.
🔧 Tools & Products
dbt Core 2.0 lands in alpha, rebuilt in Rust under Apache 2.0
dbt Labs published dbt-core v2.0.0-alpha.1 on June 1, the first alpha of dbt Core 2.0, rebuilt in Rust on the same foundation as the dbt Fusion engine and using ADBC and Parquet. Code that previously lived in the dbt-fusion repository under the source-available ELv2 license is now open-sourced under Apache 2.0 on the dbt-core main branch, folding two engines in two languages into one. The post from Joel Labes and Grace Goheen points to faster parse times on large projects, a stricter language spec that catches typos, Parquet artifacts as an alternative to JSON, and an install path that sidesteps Python virtualenv conflicts, though it gives no specific numbers.
📐 Practices & Governance
Cloudera adopts Apache Polaris and contributes a Ranger authorizer
Cloudera said on June 4 that it is adopting Apache Polaris, the open-source REST catalog built on the Iceberg REST Catalog specification, as part of its open lakehouse architecture. Alongside the adoption, Cloudera contributed a new Apache Ranger authorizer to Apache Polaris 1.5 as an external authorizer in beta, wiring an established enterprise policy engine into the catalog’s pluggable authorization model. The stated aim is centralized, governed access to analytics and AI engines across hybrid and multicloud deployments without moving data.
💎 Gems & Tools
dlt An open-source Python library for code-first ELT that infers schema and loads to warehouses, lakes, and vector stores in a few lines. Active and widely adopted, with v1.27.2 out on May 29, it is worth a look for teams replacing bespoke ingestion scripts.
LibreChat The open-source chat platform that ClickHouse Agents builds on, with multi-model support, a sandboxed code interpreter, and an MCP client. Useful to study if you want the same agent surface without the managed cloud.


