Repository navigation

#

apache-iceberg

matanolabs/matano
Rust
1550
3 个月前

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.

Java
1023
2 天前

Fastest open-source tool for replicating Databases to Data Lake in Open Table Formats like Apache Iceberg. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supporting Postgres, MongoDB and MySQL

Go
811
1 天前

Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.

Dockerfile
66
2 年前

Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Jupyter Notebook
47
3 年前

Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS

Python
32
2 个月前

An open-source, community-driven REST catalog for Apache Iceberg!

Go
27
10 个月前

This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarios using best practices. The code can be deployed into any Spark compatible engine like Amazon EMR Serverless or AWS Glue. A fully local developer environment is also provided.

Java
23
5 个月前

Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3

Python
23
7 个月前

A sample implementation of stream writes to an Iceberg table on GCS using Flink and reading it using Trino

Java
19
3 年前

Write-Audit-Publish on the lakehouse in pure Python with bauplan and DBOS

Python
13
3 个月前

This is a collecton of Amazon CDK projects to show how to directly ingest streaming data from Amazon Mananged Service for Apache Kafka (MSK) and MSK Serverless into Apache Iceberg table in S3 with AWS Glue Streaming.

Python
12
7 个月前