paper 8407 views

V-JEPA: Video Joint Embedding Predictive Architecture

Impact Summary

Problem Solved

Compute-intensive video generation limits representation learning. Joint embeddings solve this safely.

Improves On

I-JEPA / Masked Autoencoders

Abstract Snapshot

We introduce V-JEPA, a method for self-supervised learning of visual representations from video by predicting the latent representation of missing regions in a video from the unmasked context.
Computer VisionSelf-Supervised LearningJEPA

0 Reproductions

No verified reproductions yet.

0 Citations

No indexed citations yet.

Peer Comments Thread

Be the first to leave a verified peer review on this work.