Repository navigation
categorical-features
- Website
- Wikipedia
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
A Lightweight Decision Tree Framework supporting regular algorithms: ID3, C4.5, CART, CHAID and Regression Trees; some advanced techniques: Gradient Boosting, Random Forest and Adaboost w/categorical features support for Python
Tensorflow implementation of Product-based Neural Networks. An extended version is at https://github.com/Atomu2014/product-nets-distributed.
This repository contains a notebook demonstrating a practical implementation of the so-called Entity Embedding for Encoding Categorical Features for Training a Neural Network.
Scikit-Learn compatible transformer that turns categorical variables into dense entity embeddings.
Encode Categorical Features (unmaintained)
A Python framework for deploying recommendation models for form fields.
A small tutorial to demonstrate the power of CatBoost Algorithm
Predicting the ideological direction of Supreme Court decisions: ensemble vs. unified case-based model
glmdisc Python package: discretization, factor level grouping, interaction discovery for logistic regression
A mixed attributes predictive algorithm implemented in Python.
This study creates machine learning models to predict the seriousness of car crashes using 2019 and 2020 crash reports from the publicly accessable database maintained by the Chicago Police Department. A car crash is considered serious if the crash results in an injury or the car is towed due to the crash. Models use categorical features that describe conditions at the time of the crash and crash causes to predict the required target. The current focus is to classify whether a crash results in an injury. All machine learning models are trained, validated, and tested on randomly split 2019 crash reports. The best model (along with all others) are then tested using the full set of 2020 crash reports.
Multimodal deep learning package that uses both categorical and text-based features in a single deep architecture for regression and binary classification use cases.
Kaggle Categorical Feature Encoding Challenge II, private score 0.78795 (110 place)
Generic encoding of record types