Repository navigation

#

data-ingestion

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.

Java
8444
2 天前

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

Python
2940
17 小时前

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.

Java
2737
1 天前
dashbitco/broadway

Concurrent and multi-stage data ingestion and data processing with Elixir

Elixir
2510
15 天前

Pravega - Streaming as a new software defined storage primitive

Java
1996
2 个月前

Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.

Go
917
18 小时前

Copy to/from Parquet in S3, Azure Blob Storage, Google Cloud Storage, http(s) stores, local files or standard inout stream from within PostgreSQL

Rust
459
17 小时前

Orbital automates integration between data sources (APIs, Databases, Queues and Functions). BFF's, API Composition and ETL pipelines that adapt as your specs change.

TypeScript
317
1 个月前

A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way 🌰

Python
281
21 小时前

Apache Paimon Rust The rust implementation of Apache Paimon.

Rust
114
11 小时前

The Data Engineering Book - หนังสือวิศวกรรมข้อมูล ของคนไทย เพื่อคนไทย

JavaScript
112
1 年前

Apache Spark examples exclusively in Java

Java
101
2 年前

The modular, open-source backend for building AI-native software — powered by knowledge, not static data.

TypeScript
85
8 天前

Sample code for the AWS Big Data Blog Post Building a scalable streaming data processor with Amazon Kinesis Data Streams on AWS Fargate

Python
37
4 天前

OpenKit Java Reference Implementation

Java
35
9 个月前

Enables custom tracing of Java applications in Dynatrace

Java
35
8 个月前