Repository navigation

#

qwen2-vl

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, InternVL3, Ovis2.5, Llava, GLM4v, Phi4, ...) (AAAI 2025).

Python
9370
11 小时前

A community-driven AI automation framework that builds upon the incredible work of the open source community. Our goal is to combine language models with specialized tools for tasks like web search, crawling, and Python code execution, while giving back to the community that made this possible.

Python
5128
5 个月前

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Python
2628
1 天前

An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.

Python
1065
7 天前

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Python
687
13 天前

A Pure Rust based LLM (Any LLM based MLLM such as Spark-TTS) Inference Engine, powering by Candle framework.

Rust
153
25 天前

Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.

Python
150
3 个月前

[ICCV 2025] Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

Python
99
22 天前

✨🦋 illufly - 【幻蝶】基于记忆蒸馏、资料检索的自我进化智能体

Python
69
2 个月前

视频理解:千问视频多模态模型 & Dify

Python
64
1 年前

cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning

Python
51
2 个月前

A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha,meta Llama 3.2 Vision Instruct and Qwen2 VL Instruct models.

Python
39
5 个月前

基于多模态大模型的智能搜索助手,通过AI技术实现小红书平台的智能化信息检索和知识整合|An intelligent search assistant based on multimodal large models, enabling smart information retrieval and knowledge integration on the Xiaohongshu platform.

Python
23
9 个月前

Community-built Qwen AI Provider for Vercel AI SDK - Integrate Alibaba Cloud's Qwen models with Vercel's AI application framework

TypeScript
21
2 天前

This project demonstrates how to use the Qwen2-VL model from Hugging Face for Optical Character Recognition (OCR) and Visual Question Answering (VQA). The model combines vision and language capabilities, enabling users to analyze images and generate context-based responses.

Jupyter Notebook
19
10 个月前

Qwen2-VL在文旅领域的LLaMA-Factory微调案例 The case for fine-tuning Qwen2-VL in the field of historical literature and museums

12
1 年前

Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation

Python
10
4 个月前

This open-source project delivers a complete pipeline for converting multi-page documents (PDFs/images) into structured JSON using Vision LLMs on Amazon SageMaker. The solution leverages the SWIFT Framework to fine-tune models specifically for document understanding tasks.

Jupyter Notebook
10
16 天前

An open-source server implementation for inference Qwen2-VL series model using fastapi.

Python
9
9 个月前