Repository navigation

#

visual-language-models

a state-of-the-art-level open visual language model | 多模态预训练模型

Python
6669
1 年前

🦀️ CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/

Python
375
3 个月前

The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning

Python
316
4 个月前

Official repository of FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis

Python
47
6 个月前

Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖

Python
46
1 年前
Python
36
8 天前

WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning

Python
35
4 个月前

Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"

Python
33
1 年前

Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.

Python
31
7 个月前

Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"

Python
29
1 年前

Scene and animal attribute retrieval from camera trap data with domain-adapted vision-language models

Python
25
2 年前

Universal Adversarial Perturbations for Vision-Language Pre-trained Models

Python
21
2 个月前

This repository contains the data and code of the paper titled "IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models"

Jupyter Notebook
20
5 个月前

Code for the paper "Towards Concept-based Interpretability of Skin Lesion Diagnosis using Vision-Language Models", IEEE ISBI 2024 (Oral).

Jupyter Notebook
15
1 年前

[ICCVW 2025] Implementation for DAM-QA: Describe Anything Model for Visual Question Answering on Text-rich Images

Python
13
21 天前

[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"

Python
11
1 年前
Python
10
2 年前