Repository navigation

#

自然语言处理

Created by Alan Turing

维基百科

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3825
8 天前

ChatGPT带火了聊天机器人,主流的趋势都调整到了GPT类模式,本项目也与时俱进,会在近期更新GPT类版本。基于本项目和自己的语料可以训练出自己想要的聊天机器人,用于智能客服、在线问答、闲聊等场景。

Python
3573
10 个月前
cbamls/AI_Tutorial

精选机器学习,NLP,图像识别, 深度学习等人工智能领域学习资料,搜索,推荐,广告系统架构及算法技术资料整理。算法大牛笔记汇总

3392
1 年前

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

Python
1574
6 天前

自然语言处理领域下的相关论文(附阅读笔记),复现模型以及数据处理等(代码含TensorFlow和PyTorch两版本)

Python
1242
1 年前
pemistahl/lingua-go
Go
1237
2 个月前

A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).

Python
1227
3 个月前

End-to-end neural table-text understanding models.

Python
1171
9 个月前

A deep dive into embeddings starting from fundamentals

Jupyter Notebook
1012
5 个月前

Rasa UI is a frontend for the Rasa Framework

JavaScript
964
2 年前

The most accurate natural language detection library for Rust, suitable for short text and mixed-language text

Rust
950
6 天前

skweak: A software toolkit for weak supervision applied to NLP tasks

Python
922
8 个月前

We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.

Python
841
9 个月前