Repository navigation

#

data-pipeline

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Python
17901
43 分钟前

Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.

Java
11296
2 天前
snowplow/snowplow

The leader in Next-Generation Customer Data Infrastructure

Scala
6913
1 个月前

A list of useful resources to learn Data Engineering from scratch

3763
10 个月前

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

Python
2940
17 小时前

An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈

Jupyter Notebook
2707
3 个月前
elementary-data/elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

HTML
2048
2 天前

BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.

Java
1655
1 年前

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

Jupyter Notebook
1383
1 个月前

Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.

Jupyter Notebook
1047
3 天前

Smarter data pipelines for audio.

Python
849
1 年前