Repository navigation
etl
- Website
- Wikipedia
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
An orchestration platform for the development, production, and observation of data assets.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Fancy stream processing made operationally mundane
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
Flink CDC is a streaming data integration tool
The open source ELT framework powered by Apache Arrow
Privacy and Security focused Segment-alternative, in Golang and React
Build data pipelines, the easy way 🛠️
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Open Source Data Security Platform for Developers to Monitor and Detect PII, Anonymize Production Data and Sync it across environments.
Spreadsheet with AI, Code, Connections
Maestro: Netflix’s Workflow Orchestrator
A curated list with resources about node-based UIs
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
A system for agentic LLM-powered data processing and ETL