Repository navigation
multimodel
- Website
- Wikipedia
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
DeepResearchAgent is a hierarchical multi-agent system designed not only for deep research tasks but also for general-purpose task solving. The framework leverages a top-level planning agent to coordinate multiple specialized lower-level agents, enabling automated task decomposition and efficient execution across diverse and complex domains.
RMDL: Random Multimodel Deep Learning for Classification
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.
yolov5, yolov8, segmenations, face, pose, keypoints on deepstream
🧘🏻♂️KarmaVLM (相生):A family of high efficiency and powerful visual language model.
This is our solution for KDD Cup 2020. We implemented a very neat and simple neural ranking model based on siamese BERT which ranked first among the solo teams and ranked 12th among all teams on the final leaderboard.
OpenVINO+NCS2/NCS+MutiModel(FaceDetection, EmotionRecognition)+MultiStick+MultiProcess+MultiThread+USB Camera/PiCamera. RaspberryPi 3 compatible. Async.
End-to-End AI Voice Assistant pipeline with Whisper for Speech-to-Text, Hugging Face LLM for response generation, and Edge-TTS for Text-to-Speech. Features include Voice Activity Detection (VAD), tunable parameters for pitch, gender, and speed, and real-time response with latency optimization.
ArangoGraph is the easiest way to run ArangoDB. Available on AWS and Google Cloud.
Robust particle filter based on dynamic averaging of multiple noise models
This project is a multi-modal model that works with multiple models combined and accepts audio, images, and text as inputs, generating corresponding audio, images, and text outputs.
VyomAI: state-of-the-art NLP LLM Vision MultiModel transformers implementation into Pytorch
A powerful AI CLI tool with multiple model support
The Pictionary app uses LLaMA 3.1 to generate random drawing prompts and LLaMA 3.2 Vision to predict and judge user drawings based on these prompts. It provides an interactive and fun way to test your drawing skills within a set time limit.
Papers on the topic of multimodal learning with graphs
Simplify time-consuming coding for the data scientist. Create beautiful charts, pandas transformers, and find the best model with the best parameters for your data.