Repository navigation

#

qwen2-vl

A community-driven AI automation framework that builds upon the incredible work of the open source community. Our goal is to combine language models with specialized tools for tasks like web search, crawling, and Python code execution, while giving back to the community that made this possible.

Python
5128
6 个月前

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Python
2631
5 天前

An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.

Python
1223
3 天前

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Python
700
1 个月前

A Pure Rust based LLM (Any LLM based MLLM such as Spark-TTS) Inference Engine, powering by Candle framework.

Rust
163
11 天前

Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.

Python
155
5 个月前

[ICCV 2025] Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

Python
114
2 个月前

✨🦋 illufly - 【幻蝶】基于记忆蒸馏、资料检索的自我进化智能体

Python
72
4 个月前

cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning

Python
64
16 天前

视频理解:千问视频多模态模型 & Dify

Python
64
1 年前

A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha,meta Llama 3.2 Vision Instruct and Qwen2 VL Instruct models.

Python
39
10 天前

Community-built Qwen AI Provider for Vercel AI SDK - Integrate Alibaba Cloud's Qwen models with Vercel's AI application framework

TypeScript
25
6 天前

基于多模态大模型的智能搜索助手,通过AI技术实现小红书平台的智能化信息检索和知识整合|An intelligent search assistant based on multimodal large models, enabling smart information retrieval and knowledge integration on the Xiaohongshu platform.

Python
22
1 年前

This project demonstrates how to use the Qwen2-VL model from Hugging Face for Optical Character Recognition (OCR) and Visual Question Answering (VQA). The model combines vision and language capabilities, enabling users to analyze images and generate context-based responses.

Jupyter Notebook
21
1 年前

Qwen2-VL在文旅领域的LLaMA-Factory微调案例 The case for fine-tuning Qwen2-VL in the field of historical literature and museums

13
1 年前

Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation

Python
11
5 个月前

This open-source project delivers a complete pipeline for converting multi-page documents (PDFs/images) into structured JSON using Vision LLMs on Amazon SageMaker. The solution leverages the SWIFT Framework to fine-tune models specifically for document understanding tasks.

Jupyter Notebook
10
2 个月前

An open-source server implementation for inference Qwen2-VL series model using fastapi.

Python
9
10 个月前

Colaboratory上でQwenLM/Qwen2-VLをお試しするサンプル

Jupyter Notebook
6
1 年前