Repository navigation

#

data-engineering-pipeline

Python
59
10 个月前

A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apache Kafka and stored in a local Cassandra database.

Python
25
2 年前

A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api

Jupyter Notebook
21
5 个月前

Analysis of 311 Service Requests for the City of NYC (from 2010 to 2023) Tech: Prefect cloud, dbt core, BigQuery, Compute Engine, CloudRun, Artifact Registry, Terraform, Docker

Python
20
2 年前

ETL pipeline combined with supervised learning and grid search to classify text messages sent during a disaster event

Python
17
6 年前

A batch Data Pipeline that retrieves data from a user purchase table and a movie review table and is transformed to form a user behaviour metric table.

HCL
16
3 年前

Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DocumentDB as the database

Python
13
3 年前