Repository navigation
#
cross-modal-pretraining
- Website
- Wikipedia
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Python
2989
1 年前
[NeurIPS 2022 Spotlight] RLIP: Relational Language-Image Pre-training and a series of other methods to solve HOI detection and Scene Graph Generation.
Python
74
1 年前