Computer vision libraries and models for image understanding, generation, OCR, and object detection.
Computer Vision
Repositories
A powerful web UI for Stable Diffusion built with Gradio. Features include txt2img, img2img, inpainting, upscaling, LoRA support, custom scripts, and extensive extensions for AI image generation.
OpenCV is an open-source computer vision and machine learning software library. It provides real-time optimized tools for image processing, object detection, video analysis, and AI model execution across multiple platforms and programming languages.
Real-time face swapping and video deepfake tool that works with just a single image. Supports webcam streaming, video processing, and multiple GPU acceleration options including CUDA, CoreML, and DirectML.
Tesseract OCR engine with neural network (LSTM) support for 100+ languages. Includes command-line tool and API library for text extraction from images.
Industry-leading OCR and document AI engine that converts images/PDFs into structured data. Supports 100+ languages, complex document parsing, intelligent information extraction, and deployment across multiple platforms.
Stable Diffusion is a latent text-to-image diffusion model that generates photo-realistic images from text prompts. Built on latent diffusion architecture with a CLIP text encoder, it enables high-quality image synthesis, image-to-image translation, and inpainting tasks.
YOLOv5 is a state-of-the-art computer vision model for real-time object detection, segmentation, and classification. Built on PyTorch, it offers exceptional speed, accuracy, and ease of use for both research and production deployment.
A powerful yet simple Python library for face recognition with 99.38% accuracy on LFW benchmark. Provides easy API for face detection, facial feature analysis, and identity recognition with command-line tools.
FaceSwap is an open-source deepfake tool that uses deep learning to detect and swap faces in images and videos. It provides a complete workflow including face extraction, model training, and conversion with multiple model support and GPU acceleration.
Ultralytics YOLO is a cutting-edge computer vision framework providing state-of-the-art object detection, segmentation, classification, tracking, and pose estimation models. Fast, accurate, and easy to use with extensive deployment options.
Meta AI's Segment Anything Model (SAM) is a breakthrough foundation model for promptable image segmentation. It generates high-quality object masks from simple prompts like points or boxes, trained on 11M images with 1.1B masks, delivering exceptional zero-shot performance across diverse segmentation tasks.