Repository navigation

#

data-engineering-pipeline

A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apache Kafka and stored in a local Cassandra database.

Python
25
2 年前

A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api

Jupyter Notebook
23
9 个月前

Analysis of 311 Service Requests for the City of NYC (from 2010 to 2023) Tech: Prefect cloud, dbt core, BigQuery, Compute Engine, CloudRun, Artifact Registry, Terraform, Docker

Python
20
2 年前

Reusable data engineering toolkit My personal data infrastructure

Jupyter Notebook
18
2 个月前

ETL pipeline combined with supervised learning and grid search to classify text messages sent during a disaster event

Python
17
6 年前

A batch Data Pipeline that retrieves data from a user purchase table and a movie review table and is transformed to form a user behaviour metric table.

HCL
17
5 天前