Repository navigation

florence2

Website
Wikipedia

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

llava 大语言模型 MLX vision-transformer apple-silicon idefics local-ai paligemma vision-framework vision-language-model florence2 molmo pixtral

Python

1673

180

1 天前

jamjamjon / usls

A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models.

CUDA tensorrt yolov8 OCR yolo sam grounding-dino onnxruntime florence2 clip yolov10 onnx

Rust

213

13 小时前

SangbumChoi / florence2-triton

Unofficial repository for building Florence-2 in Microsoft Azure

multi-modal vision-language-model florence2

Jupyter Notebook

2 年前

noorkhokhar99 / Florence-2-for-object-detection-in-Python

Florence-2 for object detection in Python

computer 机器视觉 florence2 Python vision

Jupyter Notebook

1 年前

muhammad-ahmed-ghani / video-inpainting

This repository provides a powerful AI-driven solution for removing objects from videos using text prompts. By integrating SAM2, Florence2, and ProPainter, the model enables precise and seamless object removal. Simply describe the objects to remove (e.g., "man, car, cap, basket"), and the AI will handle the rest with high accuracy.

florence2

Python

8 个月前

mithunparab / virtual-staging

This project applies a modular generative AI pipeline to perform virtual staging on empty room images. It synthesizes realistic, high-quality interior furnishings while rigorously preserving the original room’s geometry, structure, and spatial consistency.

controlnet florence2 sam stable-diffusion

Python

3 个月前

theshubhamp / sample-florence2-object-detection

Sample: Object Detection over a Video Stream using Microsoft's Florence-2 Model

florence-2 florence2 object-detection OpenCV

Python

2 个月前

bayujawir / SmolVLM

SmolVLM 🐙: Ready-to-run SmolVLM2 Docker image with web UI and HTTP API for image-to-text and text-to-text tasks; offline-capable, low GPU needs (>=4GB VRAM).

聊天机器人 clip CUDA finetune florence2 idefics image-classification multi-modal Python vllm webcam-capture

Python

1 个月前