TL;DR

Streambed is a new CDC engine that streams Postgres WAL changes directly to S3 as Iceberg tables. It supports Postgres wire protocol, allowing users to query the data with familiar tools. The project aims to offload analytical workloads from production databases without requiring application changes.

Streambed, a new open-source tool, enables real-time streaming of Postgres write-ahead log (WAL) changes directly into Iceberg tables stored on S3, supporting the Postgres wire protocol for querying. This development allows users to offload analytical queries from production databases without altering their applications, potentially reducing load on operational systems.

Streambed connects to a Postgres database as a logical replication subscriber, decoding WAL messages that include inserts, updates, and deletes. It buffers these changes and periodically writes them as Parquet files to S3, simultaneously updating Iceberg metadata to maintain consistency. The system supports updates and deletes through copy-on-write merging, ensuring data accuracy.

The tool includes a built-in query server that exposes Iceberg tables over the Postgres wire protocol, enabling users to connect with psql or any Postgres-compatible client for ad hoc queries. This feature allows seamless integration with existing workflows without requiring additional query engines like Spark.

Installation involves running Postgres and MinIO (or another S3-compatible storage) via Docker, building the Go application, and configuring synchronization commands. The project requires Go 1.22+ and CGO for certain components, with support for unit and integration testing.

Why It Matters

This development matters because it offers a simplified, real-time data replication pipeline from Postgres to a scalable data lake on S3, facilitating analytics without impacting transactional performance. By supporting the Postgres wire protocol, it enables familiar querying methods, broadening accessibility for data analysts and engineers who rely on Postgres tools. The approach eliminates the need for traditional ETL processes or Spark-based pipelines, potentially reducing complexity and costs in data infrastructure.

Amazon

PostgreSQL to S3 data pipeline

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

Traditional CDC solutions often involve complex ETL pipelines, Spark, or third-party tools that can introduce latency or operational overhead. Existing solutions may require significant setup or disrupt production systems. Streambed’s approach, streaming WAL changes directly to S3 and supporting real-time querying via the Postgres protocol, offers a simplified alternative. The project builds on existing concepts of logical replication and Iceberg’s table format but integrates them into a cohesive, developer-friendly package.

“Streambed streams WAL changes via logical replication, writes Parquet files to S3, and commits Iceberg metadata, supporting real-time analytics with minimal setup.”

— Viggy28, creator of Streambed

“The support for Postgres wire protocol means you can query the data with familiar tools like psql, which is a big plus for adoption.”

— Hacker News community member

Amazon

Iceberg table storage on Amazon S3

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It is not yet clear how well Streambed performs at scale or how it handles edge cases like network interruptions or large transaction volumes. The project is still in early stages, with ongoing development and testing, and real-world adoption details are not available.

Amazon

Postgres wire protocol client tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

Next steps include broader testing in production environments, performance benchmarking, and potentially adding features like more granular control over sync intervals or support for additional storage backends. The project’s maintainers may also focus on improving documentation and usability for wider adoption.

Amazon

real-time data streaming tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Can I query the Iceberg tables using standard Postgres tools?

Yes. Streambed includes a query server that exposes Iceberg tables over the Postgres wire protocol, allowing connection with psql and other Postgres clients.

Does using Streambed require changes to my existing Postgres setup?

No. Streambed connects as a logical replication subscriber, which does not require modifications to your application or database schema.

What storage backend does Streambed support?

It supports S3-compatible storage, such as MinIO or AWS S3, for storing Parquet files and Iceberg metadata.

Is Streambed suitable for high-volume transactional systems?

While designed for real-time analytics, performance at very high transaction volumes is still being evaluated. Further testing is needed to confirm scalability.

Source: Hacker News

You May Also Like

White Printer Suppliers: The Shockingly Affordable Deals They Offer

Just when you think printers have to be expensive, discover how white printer suppliers are revolutionizing affordability and saving you money!

White Printer Suppliers: The Sneaky Sales Tactics You Need to Know

Master the art of navigating white printer suppliers’ sneaky sales tactics to avoid hidden fees and inflated costs—your wallet will thank you!

China’s Pink Printer Suppliers: This Trend Is Bigger Than You Think

The rise of pink printers in China reflects a deeper trend in consumer preferences, hinting at a vibrant future that you won’t want to miss.

iPhone 18 Pro Max vs Google Pixel 11 Pro XL: Main differences to expect

Compare the upcoming iPhone 18 Pro Max and Pixel 11 Pro XL, focusing on design, display, performance, and camera features as confirmed and rumored.