Repository navigation
data-preprocessing-pipelines
- Website
- Wikipedia
Open source project for data preparation of LLM application builders
Python package for Customizable Data Preprocessing Pipelines
This repository containing code for preprocessing text data from PDF and DOCX files for use with GPT-3. It includes steps such as tokenization, removal of stop words and punctuation, and formatting for GPT-3 input.
Collect POST requests
Understand and Implement decision tree
This repository contains a sample text data-preparation code using Nemo Curator for pre-training or synthetic data generation
This work highlights my contribution as a "ML Engineer" at "adorsho praniSheb"(an ML based agro farming company of Bangladesh) where I was assigned the task of designing the preprocessing pipeline.
Project for Machine Learning Data Mining course
The data process library to help better industrial data understanding.
Machine learning models cannot be directly applied to raw data. This desktop application consists of a central server and two client servers. The main servers send raw data to clients, where the data is preprocessed and prepared to be fed to the machine learning model.