Repository navigation

#

data-pipeline

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Python
19234
7 分钟前

Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.

Java
11748
2 小时前
snowplow/snowplow

The leader in Customer Data Infrastructure

Scala
6952
2 个月前

A list of useful resources to learn Data Engineering from scratch

3858
1 年前

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

Python
3207
7 小时前

An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈

Jupyter Notebook
2741
7 个月前
elementary-data/elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

HTML
2129
4 小时前

BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.

Java
1668
2 年前

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

Jupyter Notebook
1391
5 个月前

Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.

Jupyter Notebook
1312
14 小时前

Fastest open-source tool for replicating Databases to Data Lake in Open Table Formats like Apache Iceberg. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supporting Postgres, MongoDB , MySQL and Oracle

Go
1009
6 小时前