Repository navigation
data-pipelines
- Website
- Wikipedia
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
An orchestration platform for the development, production, and observation of data assets.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
🦀 event stream processing for developers to collect and transform data in motion to power responsive data intensive applications.
Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, DuckDB, Pandas, and Plotly, Matplotlib, etc. Build dashboards, reports, and notebooks that run offline, load fast, and share like a document.
Build data pipelines, the easy way 🛠️
Maestro: Netflix’s Workflow Orchestrator
A system for agentic LLM-powered data processing and ETL
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
The best place to learn data engineering. Built and maintained by the data engineering community.
The Feldera Incremental Computation Engine
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
MLeap: Deploy ML Pipelines to Production
Concurrent Python made simple
Easy Data Preparation with latest LLMs-based Operators and Pipelines.
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.