Repository navigation

#

vlms

Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!

Python
8418
2 小时前

Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.

855
7 天前

Official repository for VisionZip (CVPR 2025)

Python
337
1 个月前

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Python
297
9 个月前

[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"

Python
207
1 年前

Official Repository of OmniCaptioner

Python
157
4 个月前

[NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation

Python
104
10 个月前

This repository collects research papers of large Foundation Models for Scenario Generation and Analysis in Autonomous Driving. The repository will be continuously updated to track the latest update.

74
22 天前

[CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

Python
65
12 天前

[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment

Python
57
1 年前

[ACL 2025 🔥] A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding

Python
46
3 个月前

Benchmarking Vision-Language Models on OCR tasks in Dynamic Video Environments

Python
44
6 个月前

[ICASSP 2024] The official repo for Harnessing the Power of Large Vision Language Models for Synthetic Image Detection

Python
31
6 天前

[COLM 2025] JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model

18
1 个月前

[NAACL 2025] Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning

Jupyter Notebook
10
6 个月前

A collection of VLMs papers, blogs, and projects, with a focus on VLMs in Autonomous Driving and related reasoning techniques.

10
9 个月前

Convert documents, images to high-quality Markdown using Vision LLMs. Built for RAG ingestion pipelines.

Python
9
4 天前