Repository navigation
#
language-vision
- Website
- Wikipedia
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
Python
1120
4 个月前
[ICCV 2023] RLIPv2: Fast Scaling of Relational Language-Image Pre-training
Python
127
1 年前
[ICRA 2024] Language-Conditioned Affordance-Pose Detection in 3D Point Clouds
Python
33
3 个月前
[NAACL Findings 2025] Code and data of "Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting"
Python
3
2 个月前
MTA: A Lightweight Multilingual Text Alignment Model for Cross-language Visual Word Sense Disambiguation
Jupyter Notebook
1
2 年前
Hands on some MultiModal Models
Jupyter Notebook
0
1 年前