Repository navigation
blip
- Website
- Wikipedia
Famous Vision Language Models and Their Architectures
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Chain together LLMs for reasoning & orchestrate multiple large models for accomplishing complex tasks
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
This repository provides an interactive image colorization tool that leverages Stable Diffusion (SDXL) and BLIP for user-controlled color generation. With a retrained model using the ControlNet approach, users can upload images and specify colors for different objects, enhancing the colorization process through a user-friendly Gradio interface.
A data discovery and manipulation toolset for unstructured data
Image captioning using python and BLIP
[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives
FiveM Script to allow civilians to dial 911, giving out their location, name, and reason they called, adding a blip to the map too
Collection of OSS models that are containerized into a serving container
CLIP Interrogator, fully in HuggingFace Transformers 🤗, with LongCLIP & CLIP's own words and / or *your* own words!
SAM + CLIP + DIFFUSION for image to edit objects in images using plain text
oCaption: Leveraging OpenAI's GPT-4 Vision for Advanced Image Captioning
Securade.ai Sentinel - A monitoring and surveillance application that enables visual Q&A and video captioning for existing CCTV cameras.
Finding scenes that you want by text automatically