Repository navigation

#

data-ingestion

Java
8807
12 小时前

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

Python
3236
1 天前

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.

Java
3011
5 天前
dashbitco/broadway

Concurrent and multi-stage data ingestion and data processing with Elixir

Elixir
2577
3 天前

Pravega - Streaming as a new software defined storage primitive

Java
2003
7 个月前

Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.

Go
1019
9 小时前

Copy to/from Parquet in S3, Azure Blob Storage, Google Cloud Storage, http(s) stores, local files or standard inout stream from within PostgreSQL

Rust
597
3 天前

The Supabase of AI era. A modular, open-source backend for building AI-native software — designed for knowledge, not static data.

TypeScript
355
4 个月前

Orbital automates integration between data sources (APIs, Databases, Queues and Functions). BFF's, API Composition and ETL pipelines that adapt as your specs change.

TypeScript
334
3 个月前

A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way 🌰

Python
281
5 个月前

Apache Paimon Rust The rust implementation of Apache Paimon.

Rust
130
5 个月前

The Data Engineering Book - หนังสือวิศวกรรมข้อมูล ของคนไทย เพื่อคนไทย

JavaScript
114
2 个月前

Apache Spark examples exclusively in Java

Java
102
2 年前

Build complete API integrations with YAML and SQL. Rapid development without vendor lock-in and per-row costs.

Python
85
4 个月前

Sample code for the AWS Big Data Blog Post Building a scalable streaming data processor with Amazon Kinesis Data Streams on AWS Fargate

Python
38
6 个月前

Enables custom tracing of Java applications in Dynatrace

Java
38
5 个月前

OpenKit Java Reference Implementation

Java
35
1 年前