Repository navigation

#

自然语言处理

Created by Alan Turing

维基百科

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

Structured data extraction and instruction calling with ML, LLM and Vision LLM

Python
4958
2 个月前

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3943
9 天前

ChatGPT带火了聊天机器人,主流的趋势都调整到了GPT类模式,本项目也与时俱进,会在近期更新GPT类版本。基于本项目和自己的语料可以训练出自己想要的聊天机器人,用于智能客服、在线问答、闲聊等场景。

Python
3584
1 年前
cbamls/AI_Tutorial

精选机器学习,NLP,图像识别, 深度学习等人工智能领域学习资料,搜索,推荐,广告系统架构及算法技术资料整理。算法大牛笔记汇总

3495
1 年前

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

Python
1611
4 个月前

自然语言处理领域下的相关论文(附阅读笔记),复现模型以及数据处理等(代码含TensorFlow和PyTorch两版本)

Python
1295
2 年前
pemistahl/lingua-go
Go
1271
6 个月前

A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).

Python
1241
1 个月前

End-to-end neural table-text understanding models.

Python
1195
1 年前

🌟100+ 原创 LLM / RL 原理图📚,《大模型算法》作者巨献!💥(100+ LLM/RL Algorithm Maps )

Python
1182
4 天前

A deep dive into embeddings starting from fundamentals

Jupyter Notebook
1029
9 个月前

The most accurate natural language detection library for Rust, suitable for short text and mixed-language text

Rust
984
12 小时前

Rasa UI is a frontend for the Rasa Framework

JavaScript
964
3 年前

skweak: A software toolkit for weak supervision applied to NLP tasks

Python
926
1 年前

We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.

Python
857
1 年前