Repository navigation

etl-pipelines

Website
Wikipedia

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

extraction pdf tika unstructured unstructured-data data-pipelines docx etl etl-pipelines 大语言模型机器学习自然语言处理 OCR pdf-parser rag Rust

Rust

1560

9 个月前

Burla-Cloud / burla

The simplest way to run Python on lot's of computers.

batch-processing Python data-pipelines etl-pipelines

TypeScript

122

1 天前

patterns-app / patterns-devkit

Data pipelines from re-usable components

数据科学数据分析 pipelines etl etl-pipeline etl-framework functional-reactive-programming data-engineering SQL immutability data-pipeline data-pipelines etl-pipelines

Python

107

3 年前

level-vc / useful

The open-source Useful SDK. One python decorator in the Useful library allows for full observability of Python functions within an ETL.

etl etl-pipelines telemetry

Python

2 年前

Chek0rrdn / DataEngineer_ETL

A project structure for doing and sharing data engineer work.

Python cookiecutter cookiecutter-template data-extraction data-engineering etl etl-pipeline etl-pipelines scraper

Python

4 年前

datacompose / datacompose

Clean API primitives for data cleaning in Pyspark. Inspired by PyJanitor, Dataprep.AI and Shadcn.

人工智能数据科学 etl etl-pipeline etl-pipelines pyspark Python SQL databricks salesforce

Python

20 天前

abrahamkoloboe27 / Airflow-Pipeline-Dashboard-Compagnie-Aerienne

Lien de l'application

airflow Docker Docker Compose MongoDB PostgreSQL Streamlit data-engineering etl-pipeline makefile duckdb atlas Dockerfile etl etl-pipelines mongodb-atlas orchestration Python

Python

10 个月前

EmmanuelEzenwere / DataSift

DataSift auto applies a data pre-processing pipeline to Data Science Projects.

data-engineering data-preprocessing 数据科学 etl-pipelines

Python

1 年前

ChristianRCanlas / ChristianRCanlas.github.io

e-Portfolio showcasing my personal projects.

数据分析数据可视化 data-warehousing etl-pipelines sql-server predictive-analytics Python tableau time-series-forecasting arima

Python

9 个月前

angelxd84130 / Airflow-ETL

Build ETL piplines on AirFlow to load data from BigQuery and store it in MySQL

airflow apache-airflow BigQuery etl etl-pipeline MySQL etl-pipelines

Python

3 年前

prneidhardt / Apache-Data-Pipeline

Sparkify project

Amazon Web Services etl-pipelines Python

Jupyter Notebook

1 年前

ragztigadi / BigData-ETL-Pipelines-Ecommerce

Big Data ETL pipeline for Brazilian e-commerce data. Implements data ingestion, transformation, and storage using Apache Spark, Hadoop, and SQL. Designed for scalable data processing and analytics.

Azure DevOps MongoDB MySQL Python powerbi etl-pipelines SQL

HTML

6 个月前

pranaypkadu / networksecurity

End To End MLOPS Project With ETL Pipelines- Building Network Security System

aws-ec2 aws-s3 Docker etl-pipelines Actions mlflow mlops mongodb-atlas network-security NumPy pandas Python PyTorch scikit-learn Tensorflow Visual Studio Code FastAPI

Python

9 个月前

SayamAlt / Formula-1-Data-Ingestion-Transformation---ETL-Pipeline

This project demonstrates a complete ETL pipeline for Formula 1 racing data using Azure Databricks, Delta Lake, and Azure Data Factory. It covers data ingestion, transformation with PySpark and Spark SQL, data governance with Unity Catalog, and visualization through Power BI. Designed to showcase real-world data engineering workflows in Azure.

data-engineering data-ingestion data-transformation delta-lake etl-pipelines microsoft-azure spark-mllib spark-sql spark-streaming workflow-orchestration

Python

1 年前

Guilherme-B / baboon

JSON-driven ETL pipeline framework prototype

bonobo etl-pipelines dag JSON

Python

6 年前

IMAbril / RENIS

project in process

data-cleaning data-governance data-modeling data-profiling data-validation data-wrangling etl-pipelines portfolio-project

Jupyter Notebook

8 个月前

Xuconnika / baboon

For scribes of Thoth in the shell — your codebrain’s sacred scroll.

bonobo data-layer etl-pipelines Java JSON notes pyrogram Python Script terminal-based 工具 YAML

Dockerfile

5 个月前

omar-elmaria / airflow_local

This repo contains the DAGs that run on my local Airflow environment. I use the local environment to test my DAGs before deploying them to virtual machines via Kubernetes

airflow 自动化 dags etl-pipelines orchestration Python

Python

3 年前

siddarthaThentu / Disaster-Response-Pipeline

A deployed machine learning model that has the capability to automatically classify the incoming disaster messages into related 36 categories. Project developed as a part of Udacity's Data Science Nanodegree program.

data-analytics Python ml-pipelines etl-pipelines plotly Bootstrap 机器学习 Flask hyperparameter-optimization feature-engineering

Python

4 年前

juniors90 / PymaciesArg

An extension that registers all pharmacies in Argentina.

datascience etl-framework etl-job etl-pipeline etl-pipelines pharmacies pypi-package Python

Python

3 年前