Repository navigation

#

data-pipelines

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

Java
13856
5 天前

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

HTML
12817
8 天前
StructuredLabs/preswald

Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, DuckDB, Pandas, and Plotly, Matplotlib, etc. Build dashboards, reports, and notebooks that run offline, load fast, and share like a document.

Python
4301
3 个月前

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

Python
2213
5 小时前
elementary-data/elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

HTML
2158
3 天前
data-engineering-community/data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data engineering community.

CSS
1794
4 天前

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

Rust
1560
9 个月前
Scala
1526
10 个月前

Easy Data Preparation with latest LLMs-based Operators and Pipelines.

Python
1364
4 天前

Kickstart your MLOps initiative with a flexible, robust, and productive Python package.

Jupyter Notebook
1353
6 天前