Repository navigation

data-cleansing

Website
Wikipedia

🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Apache Spark pyspark data-wrangling bigdata 数据科学 data-cleansing data-transformation 机器学习 data-profiling data-extraction data-exploration 数据分析 data-preparation cudf dask data-cleaning

Python

1516

233

9 个月前

data-forge / data-forge-ts

The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.

data-wrangling data-forge data 数据分析 JavaScript Node.js linq pandas visualization 数据可视化 data-management data-manipulation data-cleaning data-cleansing CSV JSON

TypeScript

1371

4 个月前

Desbordante / desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

data-analytics data-cleaning data-cleansing data-engineering data-exploration data-mining data-profiling 数据科学 data-wrangling data-preprocessing feature-selection feature-engineering feature-extraction Spreadsheet tabular-data anomaly-detection exploratory-data-analysis knowledge-discovery

C++

414

19 天前

BDFD-Learning-Ground / Cousera_Google-Data-Analytics-Professional-Certificate

Quizzes & Assignment Solutions for Google Data Analytics Professional Certificate on Coursera. Also included a few resources on side that I found helpful.

data-cleansing SQL decision-making 数据可视化 Python 数据科学数据分析 excel quiz Coursera R Google data data-analytics

246

3 年前

ajaymache / data-analysis-using-python

Exploratory data analysis 📊using python 🐍of used car 🚘 database taken from ⓚ𝖆𝖌𝖌𝖑𝖊

数据科学数据分析数据可视化 data-cleaning data-cleansing data-wrangling data-analytics eda exploratory-data-analysis kaggle-competition

Jupyter Notebook

227

7 年前

probcomp / PClean

A domain-specific probabilistic programming language for scalable Bayesian data cleaning

probabilistic-programming data-cleaning data-cleansing bayesian-inference

Julia

225

1 年前

data-integrations / wrangler

Wrangler Transform: A DMD system for transforming Big Data

wrangle data-transformation data-transform 数据科学 transform-data manipulate-data cdap big-data cdap-plugin transform Project preparation data-cleansing data-prep Parsing avro

Java

107

1203

13 小时前

ojasphansekar / Zillow-Home-Value-Prediction

XGBoost, LightGBM, LSTM, Linear Regression, Exploratory Data Analysis

Python 机器学习 exploratory-data-analysis data-preprocessing data-cleansing

Jupyter Notebook

6 年前

iweld / data_cleaning

An SQL data cleaning project

SQL data-analytics data-cleansing excel

3 年前

kbasu2016 / Autism-Detection-in-Adults

This is a binary classification problem related with Autistic Spectrum Disorder (ASD) screening in Adult individual. Given some attributes of a person, my model can predict whether the person would have a possibility to get ASD using different Supervised Learning Techniques and Multi-Layer Perceptron.

supervised-learning naive-bayes-classifier decision-tree-classifier random-forest support-vector-machine k-nearest-neighbours data-wrangling data-cleansing

Jupyter Notebook

7 年前

bakdata / dedupe

Java DSL for (online) deduplication

duplicate-detection data-cleaning data-cleansing Entity resolution

Java

7 个月前

AP-State-Skill-Development-Corporation / Data-Science-Using-Python-Internship-EB1

This repo created for sharing the required/discussed files during Online Internship training program on Data Science Using Python in May-2021

数据科学机器学习数据分析 Python data-visualisation data-cleansing

Jupyter Notebook

4 年前

AlexLamson / DataWrangler

Make quick and dirty data mining made easier in Sublime Text

sublime-text-plugin data-cleaning data-cleansing data-wrangling text-manipulation

Python

4 年前

brunocampos01 / porto-seguro-safe-driver-prediction

Predict if a driver will file an insurance claim next year. (Kaggle Competition)

challenge kaggle data-engineering 机器学习 Python 数据科学 random-forest xgboost kaggle-competition data-cleansing dataset

Python

4 年前

mtimjones / dataprocessing

Data cleanse, clustering with Vector Quantization and Adaptive Resonance Theory

数据科学 data-cleansing

8 年前

prachitqwer / Power-BI---Product-Rationalization

Product Rationalization of Pro Bikes Inc using Power BI

dashboard data-analytics data-cleansing data-modeling data-transformation 数据可视化 finance powerbi SQL sql-server

3 年前

data-forge / data-forge-fs

This library contains the file system extensions to Data-Forge that allow it to directly read and write CSV and JSON files in Node.js

data-wrangling data-forge data 数据分析 JavaScript Node.js linq pandas visualization 数据可视化 data-management data-manipulation data-cleaning data-cleansing CSV JSON

TypeScript

4 年前

LieseB-1746743 / data-cleaning

Data cleaning tool.

data-profiling data-cleaning data-cleansing

JavaScript

4 年前

HypertextAssassin0273 / Excel_Data_Organizer_and_Cleaner-DS_Project

Data Structures project in C++11 language, uses custom Vector & String structures with Move Semantics (Rule of Five)

Open Source C++open-source-project 数据结构 vector string Object-oriented programming (OOP)data-cleaning data-cleansing data-wrangling

C++

3 年前

DataPreprocessing / DataCleaning

Data Cleaning is a python package for data preprocessing. This cleans the CSV file and returns the cleaned data frame. It does the work of imputation, removing duplicates, replacing special characters, and many more.

data-preprocessing Python data data-wrangling data-cleaning data-cleansing

Python

4 年前