Repository navigation

#

multimodal-models

A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.

389
9 天前

Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

190
11 天前

Implementation of the paper "Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning", arXiv, 2025

Python
7
4 个月前

NanoOWL Detection System enables real-time open-vocabulary object detection in ROS 2 using a TensorRT-optimized OWL-ViT model. Describe objects in natural language and detect them instantly on panoramic images. Optimized for NVIDIA GPUs with .engine acceleration.

C++
1
5 个月前