Repository navigation

#

data-preprocessing

Easy to use Python library of customized functions for cleaning and analyzing data.

Python
509
4 个月前

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

C++
401
6 天前

A dynamic, scalable AI chatbot built with Django REST framework, supporting custom training from PDFs, documents, websites, and YouTube videos. Leveraging OpenAI's GPT-3.5, Pinecone, FAISS, and Celery for seamless integration and performance.

Python
389
1 年前

Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!

Python
377
3 年前

Social Media Mining Toolkit (SMMT) main repository

Python
133
2 年前

The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.

C++
132
19 分钟前

SEGAN pytorch implementation https://arxiv.org/abs/1703.09452

Python
109
6 年前

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

Python
90
3 年前

A time series signal analysis and classification framework

Python
85
2 年前