Repository navigation

#

multimodel

Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合

Python
5371
2 天前

DeepResearchAgent is a hierarchical multi-agent system designed not only for deep research tasks but also for general-purpose task solving. The framework leverages a top-level planning agent to coordinate multiple specialized lower-level agents, enabling automated task decomposition and efficient execution across diverse and complex domains.

JavaScript
1383
7 天前

Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.

338
5 个月前

🧘🏻‍♂️KarmaVLM (相生):A family of high efficiency and powerful visual language model.

Python
88
1 年前

This is our solution for KDD Cup 2020. We implemented a very neat and simple neural ranking model based on siamese BERT which ranked first among the solo teams and ranked 12th among all teams on the final leaderboard.

Jupyter Notebook
71
5 年前

OpenVINO+NCS2/NCS+MutiModel(FaceDetection, EmotionRecognition)+MultiStick+MultiProcess+MultiThread+USB Camera/PiCamera. RaspberryPi 3 compatible. Async.

Python
59
3 年前

End-to-End AI Voice Assistant pipeline with Whisper for Speech-to-Text, Hugging Face LLM for response generation, and Edge-TTS for Text-to-Speech. Features include Voice Activity Detection (VAD), tunable parameters for pitch, gender, and speed, and real-time response with latency optimization.

Jupyter Notebook
25
6 个月前

Accepted by TMM 2022

Python
17
3 年前

ArangoGraph is the easiest way to run ArangoDB. Available on AWS and Google Cloud.

14
1 年前

Robust particle filter based on dynamic averaging of multiple noise models

MATLAB
9
6 年前

This project is a multi-modal model that works with multiple models combined and accepts audio, images, and text as inputs, generating corresponding audio, images, and text outputs.

Python
8
1 年前

VyomAI: state-of-the-art NLP LLM Vision MultiModel transformers implementation into Pytorch

Python
5
1 个月前
TypeScript
4
1 个月前

The Pictionary app uses LLaMA 3.1 to generate random drawing prompts and LLaMA 3.2 Vision to predict and judge user drawings based on these prompts. It provides an interactive and fun way to test your drawing skills within a set time limit.

Python
4
1 年前

Papers on the topic of multimodal learning with graphs

3
1 年前

Simplify time-consuming coding for the data scientist. Create beautiful charts, pandas transformers, and find the best model with the best parameters for your data.

Jupyter Notebook
2
1 年前