Repository navigation

#

text-data

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Python
2389
4 年前

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Python
746
3 年前

Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/

Python
249
2 年前

Conversational Toolkit. An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation

Python
128
5 年前

Tools to uniformly read in text data including semi-structured transcripts

R
75
3 年前

Tools for reshaping text data

R
52
2 年前

Question Classification for the dataset CogComp QC Dataset - [ http://cogcomp.org/Data/QA/QC/ ].

Python
29
5 年前

A Python library that enables smooth keyword extraction from any text using the RAKE(Rapid Automatic Keyword Extraction) algorithm.

Python
29
1 年前

Visualize large text collections with WebGL

JavaScript
26
1 年前

Presents an optimized Apache Beam pipeline for generating sentence embeddings (runnable on Cloud Dataflow).

Python
20
4 年前

Scrape EDGAR filings from https://www.sec.gov/

Julia
14
7 个月前

Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.

HTML
13
8 年前

A dataset which contains 30k+ so called "self-help" tweets from 100+ authors.

Jupyter Notebook
9
6 年前

곰tv 자막 데이터 수집 코드

R
6
9 年前

This repository hosts a diverse NLP dataset comprising 1,000 stories spanning 100 genres for comprehensive language understanding tasks.

4
2 年前

A Python package implementing the Directed LDA model for targeted extraction of specific topics from text data

Python
4
9 个月前