Repository navigation

#

great-expectations

🌀 𝗧𝗵𝗲 𝗙𝘂𝗹𝗹 𝗦𝘁𝗮𝗰𝗸 𝟳-𝗦𝘁𝗲𝗽𝘀 𝗠𝗟𝗢𝗽𝘀 𝗙𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸 | 𝗟𝗲𝗮𝗿𝗻 𝗠𝗟𝗘 & 𝗠𝗟𝗢𝗽𝘀 for free by designing, building and deploying an end-to-end ML batch system ~ 𝘴𝘰𝘶𝘳𝘤𝘦 𝘤𝘰𝘥𝘦 + 2.5 𝘩𝘰𝘶𝘳𝘴 𝘰𝘧 𝘳𝘦𝘢𝘥𝘪𝘯𝘨 & 𝘷𝘪𝘥𝘦𝘰 𝘮𝘢𝘵𝘦𝘳𝘪𝘢𝘭𝘴

Python
905
1 年前

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.

Python
244
2 个月前

Sample project to demonstrate data engineering best practices

Python
186
1 年前

Learn how to create reliable ML systems by testing code, data and models.

Jupyter Notebook
86
3 年前

Tutorial for implementing data validation in data science pipelines

Jupyter Notebook
33
3 年前

How to evaluate the Quality of your Data with Great Expectations and Spark.

Jupyter Notebook
30
2 年前

This repository serves as a comprehensive guide to effective data modeling and robust data quality assurance using popular open-source tools

Python
29
2 年前

Prefect integrations for interacting with Great Expectations

Python
28
8 个月前

📈 A scalable, production-ready data pipeline for real-time streaming & batch processing, integrating Kafka, Spark, Airflow, AWS, Kubernetes, and MLflow. Supports end-to-end data ingestion, transformation, storage, monitoring, and AI/ML serving with CI/CD automation using Terraform & GitHub Actions.

Python
27
11 天前

A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.

Python
23
1 年前

A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in Airflow.

Python
21
3 年前

ELT Data Pipeline implementation in Data Warehousing environment

Jupyter Notebook
21
1 天前

BirdiDQ leverages the power of the Python Great Expectations open-source library and combines it with the simplicity of natural language queries to effortlessly identify and report data quality issues, all at the tip of your fingers.

Jupyter Notebook
19
2 年前

Code to demonstrate data engineering metadata & logging best practices

Python
16
1 年前

Run greatexpectations.io on ANY SQL Engine using REST API. Supported by FastAPI, Pydantic and SQLAlchemy as best data quality tool

Python
12
2 个月前

Integrating Apache Airflow, dbt, Great Expectations and Apache Superset to develop a modern open source data stack.

HTML
11
3 年前

This library is inspired by the Great Expectations library. The library has made the various expectations found in Great Expectations available when using the inbuilt python unittest assertions.

Python
10
3 年前

Using Great Expectations and Notion's API, this repo aims to provide data quality for our databases in Notion.

Python
9
3 年前