Repository navigation

#

data-preprocessing

Easy to use Python library of customized functions for cleaning and analyzing data.

Python
520
4 个月前

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

C++
414
19 天前

A dynamic, scalable AI chatbot built with Django REST framework, supporting custom training from PDFs, documents, websites, and YouTube videos. Leveraging OpenAI's GPT-3.5, Pinecone, FAISS, and Celery for seamless integration and performance.

Python
394
1 年前

Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!

Python
377
3 年前

Social Media Mining Toolkit (SMMT) main repository

Python
137
3 年前

The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.

C++
136
12 天前

SEGAN pytorch implementation https://arxiv.org/abs/1703.09452

Python
110
6 年前

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

Python
91
4 年前

Resources of our survey paper "Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies"

90
7 个月前