Repository navigation

pretrain

Website
Wikipedia

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

chinese-dataset chinese-corpus pretrain word2vec 自然语言处理 bert language-model Wiki news question-answering chinese corpus chinese-nlp dataset text-classification

9784

1564

1 个月前

keyu-tian / SparK

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"

bert convnet convolutional-neural-networks masked-image-modeling pre-trained-model self-supervised-learning sparse-convolution TLS (Transport Layer Security)cnn iclr 深度学习 object-detection PyTorch instance-segmentation mask-rcnn pretrain pretraining

Python

1354

2 年前

CLUEbenchmark / CLUECorpus2020

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

chinese chinese-corpus datasets pretrain corpus 自然语言处理 bert roberta albert

983

3 年前

yangjianxin1 / Firefly-LLaMA2-Chinese

Firefly中文LLaMA-2大模型，支持增量预训练Baichuan2、Llama2、Llama、Falcon、Qwen、Baichuan、InternLM、Bloom等大模型

firefly llama llama-2 llama2 大语言模型 baichuan baichuan-13b bloom chatglm falcon internlm lora pretrain qlora qwen

Python

413

2 年前

open-sciencelab / GraphGen

GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation

ai4science data-generation llm-training pretrain qa qwen sft knowledge-graph 大语言模型 pretraining question-answering

Python

381

5 天前

microsoft / UniVL

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"

multimodality pretraining caption pretrain Video Localization (l10n)segmentation coin alignment

Python

362

1 年前

xcfcode / What-I-Have-Read

Paper Lists, Notes and Slides, Focus on NLP. For summarization, please refer to https://github.com/xcfcode/Summarization-Papers

自然语言处理 summarization acl aaai naacl slides presentation gnn knowledge-distillation pretrain Generative Adversarial Network non-autoregressive generation graph-neural-networks notes presentations data-augmentation meta-learning conversation

165

3 年前

THUNLP-AIPoet / BERT-CCPoem

BERT-CCPoem is an BERT-based pre-trained model particularly for Chinese classical poetry

poetry bert pretrain

Python

156

4 年前

thunlp / RE-Context-or-Names

Bert-based models(BERT, MTB, CP) for relation extraction.

relation-extraction PyTorch bert contrastive-learning pretrain

Python

103

3 年前

huzongxiang / MatDGL

MatDGL is a neural network package that allows researchers to train custom models for crystal modeling tasks. It aims to accelerate the research and application of material science.

机器学习深度学习 neural-networks graph transformer massagepassing Tensorflow materials pretrain

Python

1 年前