Repository navigation
data-engineering-pipeline
- Website
- Wikipedia
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
One framework to develop, deploy and operate data workflows with Python and SQL.
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate
Data Engineering Project with Hadoop HDFS and Kafka
Code examples showing flow deployment to various types of infrastructure
Classwork projects and home works done through Udacity data engineering nano degree
Let your pipe lines flow thru the Python code in xonsh.
Agentic Data Integrator that helps you build production-ready data pipelines so you can connect to more systems, faster. You run it in your terminal as a workflow wizard.
Deploy a Prefect flow to serverless AWS Lambda function
Distributed Data Processing Pipeline for MCP.
A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apache Kafka and stored in a local Cassandra database.
A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api
Analysis of 311 Service Requests for the City of NYC (from 2010 to 2023) Tech: Prefect cloud, dbt core, BigQuery, Compute Engine, CloudRun, Artifact Registry, Terraform, Docker
A fully serverless, event-driven data pipeline that ingests, enriches, validates, and visualizes real-time news data using AWS services. Designed for cost-efficient, scalable deployment using only free-tier AWS services.
Reusable data engineering toolkit My personal data infrastructure
A batch Data Pipeline that retrieves data from a user purchase table and a movie review table and is transformed to form a user behaviour metric table.