Repository navigation

#

language-vision

unum-cloud/uform

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

Python
1120
4 个月前

[ICCV 2023] RLIPv2: Fast Scaling of Relational Language-Image Pre-training

Python
127
1 年前

[ICRA 2024] Language-Conditioned Affordance-Pose Detection in 3D Point Clouds

Python
33
3 个月前

[NAACL Findings 2025] Code and data of "Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting"

Python
3
2 个月前

MTA: A Lightweight Multilingual Text Alignment Model for Cross-language Visual Word Sense Disambiguation

Jupyter Notebook
1
2 年前
Jupyter Notebook
0
1 年前