Repository navigation

multimodel

Website
Wikipedia

LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.

6314

706

13 天前

lonePatient / awesome-pretrained-chinese-nlp-models

Awesome Pretrained Chinese NLP Models，高质量中文预训练模型&大模型&多模态模型&大语言模型集合

chinese 自然语言处理 pretrained-models bert roberta xlnet nezha ernie gpt gpt-2 dataset 大语言模型 large-language-models multimodel

Python

5404

507

15 天前

SkyworkAI / DeepResearchAgent

DeepResearchAgent is a hierarchical multi-agent system designed not only for deep research tasks but also for general-purpose task solving. The framework leverages a top-level planning agent to coordinate multiple specialized lower-level agents, enabling automated task decomposition and efficient execution across diverse and complex domains.

gaia general-purpose multiagent-systems multimodel

JavaScript

2729

374

6 天前

kk7nc / RMDL

RMDL: Random Multimodel Deep Learning for Classification

classification ensemble-learning data-mining 深度学习 text-classification image-classification text-mining Tensorflow Keras 深度神经网络 multimodel convolutional-neural-networks recurrent-neural-networks information-retrieval cnn rnn dnn 机器学习

Python

433

121

2 年前

Atomic-man007 / Awesome_Multimodel_LLM

Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.

ChatGPT dataset gpt 大语言模型 mllm multimodel 自然语言处理 pretrained-models

342

7 个月前

bharath5673 / Deepstream

yolov5, yolov8, segmenations, face, pose, keypoints on deepstream

deepsort jetson Nvidia object-detection Python tracker tracking yolo yolov5 onnx multimodel deepstream analytics yolov6 yolov8 face-detection pose-estimation segmentation

Jupyter Notebook

102

2 年前

thomas-yanxin / KarmaVLM

🧘🏻‍♂️KarmaVLM (相生)：A family of high efficiency and powerful visual language model.

llama2 llava qwen2 vlm vision-language-model visual-language-learning multimodel

Python

1 年前

chengxuanying / KDD-Multimodalities-Recall

This is our solution for KDD Cup 2020. We implemented a very neat and simple neural ranking model based on siamese BERT which ranked first among the solo teams and ranked 12th among all teams on the final leaderboard.

multimodel e-commerce recommendation-system bert

Jupyter Notebook

5 年前

PINTO0309 / OpenVINO-EmotionRecognition

OpenVINO+NCS2/NCS+MutiModel(FaceDetection, EmotionRecognition)+MultiStick+MultiProcess+MultiThread+USB Camera/PiCamera. RaspberryPi 3 compatible. Async.

openvino 树莓派 emotion-recognition picamera Python multimodel

Python

3 年前

Ankur2606 / Low-latency-AI-Voice-Assistant

End-to-End AI Voice Assistant pipeline with Whisper for Speech-to-Text, Hugging Face LLM for response generation, and Edge-TTS for Text-to-Speech. Features include Voice Activity Detection (VAD), tunable parameters for pitch, gender, and speed, and real-time response with latency optimization.

人工智能 jarvis multimodel voice-assistant Hacktoberfest

Jupyter Notebook

7 个月前

Overcautious / ADENet

Accepted by TMM 2022

multimodel speech-enhancement

Python

3 年前

arangodb / cloud

ArangoGraph is the easiest way to run ArangoDB. Available on AWS and Google Cloud.

cloud arangodb multimodel

2 年前

badripatro / MDN-VQG

vqa visual-question-answering question-answering domain-adaptation triplet-loss classification-model 深度学习机器学习 multimodel

6 年前

robinlau1981 / dmapf

Robust particle filter based on dynamic averaging of multiple noise models

particle-filter uncertainty bayesian-methods multimodel

MATLAB

6 年前

Kind-Unes / MultiModal-Model

This project is a multi-modal model that works with multiple models combined and accepts audio, images, and text as inputs, generating corresponding audio, images, and text outputs.

gemini multimodel Python transformers

Python

2 年前

Ajax0564 / VyomAI

VyomAI: state-of-the-art NLP LLM Vision MultiModel transformers implementation into Pytorch

transformer PyTorch gpt 大语言模型 multimodel encoder encoder-decoder large-language-models object-detection

Python

14 天前

nexiloop / seron-cli

A powerful AI CLI tool with multiple model support

人工智能命令行界面 Code devtool 大语言模型 multimodel Open Source tech 终端

TypeScript

19 天前

AdritPal08 / Multimodal-Pictionary-using-LLaMA-3.1-and-LLaMA-3.2-Vision

The Pictionary app uses LLaMA 3.1 to generate random drawing prompts and LLaMA 3.2 Vision to predict and judge user drawings based on these prompts. It provides an interactive and fun way to test your drawing skills within a set time limit.

multimodel Python Streamlit

Python

1 年前