paper 8407 views
V-JEPA: Video Joint Embedding Predictive Architecture
Impact Summary
Problem Solved
Compute-intensive video generation limits representation learning. Joint embeddings solve this safely.
Improves On
I-JEPA / Masked Autoencoders
Abstract Snapshot
We introduce V-JEPA, a method for self-supervised learning of visual representations from video by predicting the latent representation of missing regions in a video from the unmasked context.
Computer VisionSelf-Supervised LearningJEPA
0 Reproductions
No verified reproductions yet.
0 Citations
No indexed citations yet.
Peer Comments Thread
Be the first to leave a verified peer review on this work.
