Repository navigation

#

data-orchestration

kestra-io/kestra

Orchestrate everything - from scripts to data, infra, AI, and business - as code, with UI and AI Copilot. Simple. Fast. Scalable.

Java
21784
14 小时前
Java
7079
5 个月前

An open source, standard data file format for graph data storage and retrieval.

C++
302
7 天前

A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.

Python
139
5 年前

Best practices for data workflows, integrations with the Modern Data Stack (MDS), Infrastructure as Code (IaC), Cloud Provider Services

HCL
34
6 个月前

Data-aware orchestration with dagster, dbt, and airbyte

Python
30
3 年前

This repo contains a dataset, exercises, and sample code for an end-to-end SAP BTP data-to-value bootcamp covering SAP HANA Cloud, SAP Data Warehouse Cloud, SAP Data Intelligence Cloud, and SAP Analytics Cloud.

Jupyter Notebook
25
7 个月前

A new Airflow Provider for Fivetran, maintained by Astronomer and Fivetran

Python
22
5 天前

An operator for managing Alluxio system on Kubernetes cluster

Go
13
2 年前

CI/CD repository template to automate deployments of your production flows

HCL
13
1 年前

Get started with Dagster ASAP

Python
12
6 个月前

A simple pipeline infrastructure with ETL pipeline contained in a Docker environment on Apache Airflow for orchestration and Postgres for data warehousing

Python
7
4 年前

Develop a real-time data ingestion pipeline using Kafka and Spark. Collect minute-level stock data from Yahoo Finance, ingest it into Kafka, and process it with Spark Streaming, storing the results in Cassandra. Orchestrated the workflow using Airflow deployed on Docker.

Python
3
10 个月前

ChronoGrapher is a WIP project that aims to implement a flexible multi-language scheduler, allowing for multiple programming languages to interact with one and another or used by only one

Rust
3
3 天前

EHR pipeline that simulates MIMIC-IV patient data streams, performs advanced feature engineering and clinical severity scoring using machine learning (Random Forest Classifier), and prepares structured outputs for scalable downstream analytics

Python
3
3 个月前

Data orchestration repo with Docker deployment

Python
2
3 年前

Build an ELT pipeline with dagster and dbt to schedule loading HDB resale transactions in Singapore into Google BigQuery data warehouse, then create Power BI dashboard to enhance insight exploration.

Python
2
6 个月前