Repository navigation
data-cleansing
- Website
- Wikipedia
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
Quizzes & Assignment Solutions for Google Data Analytics Professional Certificate on Coursera. Also included a few resources on side that I found helpful.
Exploratory data analysis 📊using python 🐍of used car 🚘 database taken from ⓚ𝖆𝖌𝖌𝖑𝖊
A domain-specific probabilistic programming language for scalable Bayesian data cleaning
Wrangler Transform: A DMD system for transforming Big Data
XGBoost, LightGBM, LSTM, Linear Regression, Exploratory Data Analysis
This is a binary classification problem related with Autistic Spectrum Disorder (ASD) screening in Adult individual. Given some attributes of a person, my model can predict whether the person would have a possibility to get ASD using different Supervised Learning Techniques and Multi-Layer Perceptron.
Java DSL for (online) deduplication
This repo created for sharing the required/discussed files during Online Internship training program on Data Science Using Python in May-2021
Make quick and dirty data mining made easier in Sublime Text
Predict if a driver will file an insurance claim next year. (Kaggle Competition)
Data cleanse, clustering with Vector Quantization and Adaptive Resonance Theory
Product Rationalization of Pro Bikes Inc using Power BI
This library contains the file system extensions to Data-Forge that allow it to directly read and write CSV and JSON files in Node.js
Data cleaning tool.
Data Structures project in C++11 language, uses custom Vector & String structures with Move Semantics (Rule of Five)
Data Cleaning is a python package for data preprocessing. This cleans the CSV file and returns the cleaned data frame. It does the work of imputation, removing duplicates, replacing special characters, and many more.