Repository navigation
cross-attention
- Website
- Wikipedia
Unofficial implementation of "Prompt-to-Prompt Image Editing with Cross Attention Control" with Stable Diffusion
[TPAMI'23] Unifying Flow, Stereo and Depth Estimation
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
T-GATE: Temporally Gating Attention to Accelerate Diffusion Model for Free!
🚀 Cross attention map tools for huggingface/diffusers
Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind
1-shot image segmentation using Stable Diffusion
This is the project for the paper of "Low-Light Video Enhancement via Spatial-Temporal Consistent Decomposition" in IJCAI2025
[IV 2025, Oral] Official code of "6Img-to-3D: Few-Image Large-Scale Outdoor Novel View Synthesis"
Code on selecting an action based on multimodal inputs. Here in this case inputs are voice and text.
This is the implementation of the paper Enhanced Photovoltaic Power Forecasting: An iTransformer and LSTM-Based Model Integrating Temporal and Covariate Interactions
[NeurIPS 2023] Official implementation of the paper "CAST: Cross-Attention in Space and Time for Video Action Recognition"
The official repository of "Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models".
[ITSC-2023] HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection
A lightweight PyTorch implementation of the Transformer-XL architecture proposed by Dai et al. (2019)
Tensorflow implementation of 'Robust Image Watermarking based on Cross-Attention and Invariant Domain Learning'
[ICIP 2025] Official implementation of RT-X Net: RGB-Thermal cross attention network for Low-Light Image Enhancement
SOVL System (Self-Organizing Virtual Lifeform): A complex, purpose-agnostic autonomous agent with continuous, asynchronous learning capabilities via a dynamic scaffolded LLM and a frozen base LLM
Transcription factor binding site prediction for novel DNA sequence data aiding in mutation identification and drug discovery
TGRS: Code for "Unsupervised Hybrid Network of Transformer and CNN for Blind Hyperspectral and RGB Image Fusion"