Repository navigation

#

data-preprocessing-pipelines

This repository containing code for preprocessing text data from PDF and DOCX files for use with GPT-3. It includes steps such as tokenization, removal of stop words and punctuation, and formatting for GPT-3 input.

Python
6
2 年前

This repository contains a sample text data-preparation code using Nemo Curator for pre-training or synthetic data generation

Jupyter Notebook
1
4 个月前

This work highlights my contribution as a "ML Engineer" at "adorsho praniSheb"(an ML based agro farming company of Bangladesh) where I was assigned the task of designing the preprocessing pipeline.

Jupyter Notebook
0
2 年前

The data process library to help better industrial data understanding.

Jupyter Notebook
0
6 个月前

Machine learning models cannot be directly applied to raw data. This desktop application consists of a central server and two client servers. The main servers send raw data to clients, where the data is preprocessed and prepared to be fed to the machine learning model.

0
2 年前