Repository navigation

#

datacleansing

OpenRefine is a free, open source power tool for working with messy data and improving it

Java
11275
2 天前

A Scalable Data Cleaning Library for PySpark.

Python
27
6 年前

Table Enforcer is my attempt to apply a sort of "test driven development" workflow to data cleaning and validation. A python package to facilitate the iterative process of developing and using schema-like representations of DataFrames in pandas for recoding and validating instances of these data.

Python
17
7 年前

-This project targets the textual analysis of Egyptian movie plot summaries that were curated from online sources, covering the four golden decades of Egyptian Cinema.

Jupyter Notebook
2
4 年前

This Project is based of an Online Retail store that wants to analyse major contributing factors to the revenue so they can strategically plan for next year.

1
2 年前

Analyzed a survey recieved using Power BI tool to draw useful insights.

1
3 个月前

Cleaned a movies dataset to present specific visuals to answer research questions

1
2 年前

Data cleansing and validation for Data Science Master degree

Jupyter Notebook
1
7 年前

Implementation of a Neural Network (NN) model for handwriting recognition using the MNIST dataset.

Jupyter Notebook
1
2 个月前

Advance Guide Of Cleaning & 20+ ways of cleaning data with python

1
3 年前

This project dives deep into the sales, delivery, and customer feedback data of major grocery delivery platforms – Blinkit, Swiggy Instamart, and JioMart. It is designed to showcase my ability to clean, analyze, and visualize data using Microsoft Excel.

1
14 天前

This is the curated pile of notebooks/small projects which contains linear and non-linear regression models.

Jupyter Notebook
1
4 年前

This project extracts data from Azure datalake gen 2 storage, transforming it and then transferring it to SQL database.

1
2 年前

This course by University of Michigan introduces the basics of the python programming environment, including fundamental python programming techniques such as lambdas, reading and manipulating csv files, and the numpy library. The course will also introduces data manipulation and cleaning techniques using python pandas data science library.

Jupyter Notebook
1
5 年前

This project is an internal project with INTEL where a framework for monitoring data quality from disparate sources and automating it using python.

1
4 年前