Repository navigation

#

language-vision

unum-cloud/uform

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

Python
1179
1 个月前

[ICCV 2023] RLIPv2: Fast Scaling of Relational Language-Image Pre-training

Python
134
1 年前

[ICRA 2024] Language-Conditioned Affordance-Pose Detection in 3D Point Clouds

Python
47
9 个月前

[NAACL Findings 2025] Code and data of "Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting"

Python
3
5 个月前

Visual Grounding for Autonomous Agents: linking language and vision for robotics or autonomous navigation

Python
2
2 个月前

MTA: A Lightweight Multilingual Text Alignment Model for Cross-language Visual Word Sense Disambiguation

Jupyter Notebook
1
2 年前
Jupyter Notebook
0
2 年前