Repository navigation

#

etl-pipeline

Zipstack/unstract

No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents

Python
5062
2 天前
apache/streampark

Make stream processing easier! Easy-to-use streaming application development framework and operation platform.

Java
4041
8 天前

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

Jupyter Notebook
2104
13 天前

Implementing best practices for PySpark ETL jobs and applications.

Python
1891
2 年前

A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton

Python
861
2 年前

Enterprise-grade and API-first LLM workspace for unstructured documents, including data extraction, redaction, rights management, prompt playground, and more!

Python
838
21 小时前

A Clojure high performance data processing system

Clojure
702
16 天前

A blazingly fast general purpose blockchain analytics engine specialized in systematic mev detection

Rust
605
3 天前

Integrate LLM in any pipeline - fit/predict pattern, JSON driven flows, and built in concurency support.

Python
586
1 个月前

A simplified, lightweight ETL Framework based on Apache Spark

Scala
585
1 年前

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

Python
243
2 个月前

This is a template you can use for your next data engineering portfolio project.

176
4 年前

Jayvee is a domain-specific language and runtime for automated processing of data pipelines

TypeScript
176
1 天前

Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)

Go
173
1 天前

The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.

Jupyter Notebook
121
3 年前