Repository navigation

data-preprocessing-pipelines

Website
Wikipedia

Open source project for data preparation for GenAI applications

data-preparation finetuning 大语言模型 llmapps data data-prep data-preprocessing data-preprocessing-pipelines datacuration large-language-models large-scale-data-processing Python ray Apache Spark datarecipes Code quality Entity resolution Malware

HTML

766

213

15 小时前

preprocessy / preprocessy

Python package for Customizable Data Preprocessing Pipelines

pipelines preprocessing 机器学习 python-library data-engineering 数据科学 data-preprocessing-pipelines Hacktoberfest hacktoberfest2022

Jupyter Notebook

1 个月前

shamspias / gpt3-data-preprocessing

This repository containing code for preprocessing text data from PDF and DOCX files for use with GPT-3. It includes steps such as tokenization, removal of stop words and punctuation, and formatting for GPT-3 input.

人工智能 data-preprocessing data-preprocessing-pipelines 数据科学 gpt-3 机器学习

Python

3 年前

firefly-cpp / succulent

Collect POST requests

data-collection data-preprocessing-pipelines 数据科学 ESP32 机器学习树莓派

Python

20 天前

vuanhngo14 / Decision-Tree-from-Scratch

Understand and Implement decision tree

data-preprocessing data-preprocessing-pipelines 数据可视化 decision-tree

Jupyter Notebook

2 年前

kolhesamiksha / Nemo_Curator

This repository contains a sample text data-preparation code using Nemo Curator for pre-training or synthetic data generation

curator data-preprocessing-pipelines finetuning-llms generative-ai nemo Nvidia synthetic-dataset-generation

Jupyter Notebook

8 个月前

amadou-6e / pymimic3

Pymimic3 is a scalable experimentation platform for MIMIC-III, featuring ready-to-run models, fully tested utilities for concept drift research, and a parallelized, configurable data pipeline.

concept-drift data-preprocessing data-preprocessing-pipelines 机器学习 machine-learning-projects neural-networks parallel-processing

Jupyter Notebook

10 个月前

PrasunDatta / adorsho-praniSheba_Preprocessing-Pipeline-of-Muzzle-Data-of-Cow

This work highlights my contribution as a "ML Engineer" at "adorsho praniSheb"(an ML based agro farming company of Bangladesh) where I was assigned the task of designing the preprocessing pipeline.

data-preprocessing-pipelines image-preprocessing Jupyter Notebook python-script

Jupyter Notebook

3 年前

SaraLittleSquirrel / Obesity-estimator

Project for Machine Learning Data Mining course

adaboost data-mining data-preprocessing-pipelines decision-tree 机器学习 NumPy pandas random-forest scikit-learn support-vector-machines

Jupyter Notebook

2 年前

DigitalLifeYZQiu / Data-Process-Library

The data process library to help better industrial data understanding.

data-preprocessing-pipelines

Jupyter Notebook

2 个月前

MustofAhmed41 / Data-Preprocessing-using-Distributed-Database

Machine learning models cannot be directly applied to raw data. This desktop application consists of a central server and two client servers. The main servers send raw data to clients, where the data is preprocessed and prepared to be fed to the machine learning model.

数据库机器学习 plsql data-preprocessing-pipelines distributed-database

3 年前