Repository navigation

#

florence-2

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Python
2629
9 小时前

Tag manager and captioner for image datasets

Python
1089
3 个月前

AI-Powered Watermark Remover using Florence-2 and LaMA Models: A Python application leveraging state-of-the-art deep learning models to effectively remove watermarks from images with a user-friendly PyQt6 interface.

Python
671
19 天前

Use Segment Anything 2, grounded with Florence-2, to auto-label data for use in training vision models.

Python
127
1 年前

VLM driven tool that processes surveillance videos, extracts frames, and generates insightful annotations using a fine-tuned Florence-2 Vision-Language Model. Includes a Gradio-based interface for querying and analyzing video footage.

Python
121
2 个月前

Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.

Jupyter Notebook
85
1 年前

Watermark remover tool that leverages the capabilities of Microsoft Florence and Lama Cleaner models.

Python
82
7 个月前

Florence-2

Jupyter Notebook
69
6 个月前

Use Florence 2 to auto-label data for use in training fine-tuned object detection models.

Python
67
1 年前

vision language models finetuning notebooks & use cases (Medgemma - paligemma - florence .....)

Jupyter Notebook
47
1 个月前

A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha,meta Llama 3.2 Vision Instruct and Qwen2 VL Instruct models.

Python
39
5 个月前

Run SOTA Vision-Language Model Florence-2 on your data!

Jupyter Notebook
13
5 个月前

Simple Video Summarization using Text-to-Segment Anything (Florence2 + SAM2) This project provides a video processing tool that utilizes advanced AI models, specifically Florence2 and SAM2, to detect and segment specific objects or activities in a video based on textual descriptions.

Python
10
6 个月前

Simple Gradio application integrated with Hugging Face Multimodals to support visual question answering chatbot and more features

Python
6
1 年前

This application utilizes the powerful Florence-2 vision-language model from Microsoft to generate comprehensive captions for images. The model is capable of understanding visual content and expressing it in natural language.

Python
6
23 天前

ONNX deploys for Florence 2 visual multimodal

Python
6
6 个月前

TextSnap: Demo for Florence 2 model used in OCR tasks to extract and visualize text from images.

Python
5
4 个月前

ecko-cli is a simple CLI tool that streamlines the process of processing images in a directory, generating captions, and saving them as text files. Additionally, it provides functionalities to create a JSONL file from images in the directory you specify. Images will be captioned using the Microsoft Florence-2-large model and ONNX

Python
5
1 个月前

An MCP server for processing images using Florence-2

Python
4
1 个月前