Repository navigation

#

etl-pipelines

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

Rust
1216
8 个月前

The simplest way to run Python on lot's of computers.

TypeScript
101
1 天前

The open-source Useful SDK. One python decorator in the Useful library allows for full observability of Python functions within an ETL.

Python
19
2 年前

DataSift auto applies a data pre-processing pipeline to Data Science Projects.

Python
1
1 年前

Build ETL piplines on AirFlow to load data from BigQuery and store it in MySQL

Python
1
3 年前

Big Data ETL pipeline for Brazilian e-commerce data. Implements data ingestion, transformation, and storage using Apache Spark, Hadoop, and SQL. Designed for scalable data processing and analytics.

HTML
1
5 个月前

This project demonstrates a complete ETL pipeline for Formula 1 racing data using Azure Databricks, Delta Lake, and Azure Data Factory. It covers data ingestion, transformation with PySpark and Spark SQL, data governance with Unity Catalog, and visualization through Power BI. Designed to showcase real-world data engineering workflows in Azure.

Python
0
9 个月前

JSON-driven ETL pipeline framework prototype

Python
0
5 年前

For scribes of Thoth in the shell — your codebrain’s sacred scroll.

Dockerfile
0
4 个月前

This repo contains the DAGs that run on my local Airflow environment. I use the local environment to test my DAGs before deploying them to virtual machines via Kubernetes

Python
0
3 年前

A deployed machine learning model that has the capability to automatically classify the incoming disaster messages into related 36 categories. Project developed as a part of Udacity's Data Science Nanodegree program.

Python
0
4 年前

An extension that registers all pharmacies in Argentina.

Python
0
3 年前

Weaving together different threads (services like image/audio converse, ETL services, etc.) to enable the World Wide Flow

JavaScript
0
2 年前