Repository navigation

#

image-text-retrieval

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python
7646
3 天前

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Jupyter Notebook
5194
8 个月前

Offline semantic Text-to-Image and Image-to-Image search on Android powered by quantized state-of-the-art vision-language pretrained CLIP model and ONNX Runtime inference engine

Kotlin
421
1 年前

🔍 Search local images with natural language on Android, powered by OpenAI's CLIP model. / 在 Android 上用自然语言搜索本地图片 (基于 OpenAI 的 CLIP 模型)

Kotlin
398
2 个月前

[AAAI2021] The code of “Similarity Reasoning and Filtration for Image-Text Matching”

Python
215
1 年前

Official implementation of the ICASSP-2022 paper "Text2Poster: Laying Out Stylized Texts on Retrieved Images"

Python
211
1 年前

Research Code for Multimodal-Cognition Team in Ant Group

Python
141
9 个月前

PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction

Python
99
2 年前

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)

Python
91
2 年前

使用OpenCV+onnxruntime部署中文clip做以文搜图,给出一句话来描述想要的图片,就能从图库中搜出来符合要求的图片。包含C++和Python两个版本的程序

C++
72
1 年前

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

Python
56
2 年前

Image captioning using python and BLIP

Python
47
2 年前

Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"

Python
35
8 个月前

Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"

Python
33
2 年前

[TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”

Python
33
1 年前

[EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality

Python
16
6 个月前

Noise of Web (NoW) is a challenging noisy correspondence learning (NCL) benchmark containing 100K image-text pairs for robust image-text matching/retrieval models.

Python
12
5 个月前

In this work, we implement different cross-modal learning schemes such as Siamese Network, Correlational Network and Deep Cross-Modal Projection Learning model and study their performance. We also propose a modified Deep Cross-Modal Projection Learning model that uses a different image feature extractor. We evaluate the model’s performance on image-text retrieval on a fashion clothing dataset.

Jupyter Notebook
11
4 年前