Skip to content

AI, Machine Learning and Computer Vision Meetup

Dec 12, 2024 at 10 AM Pacific

Register for the Zoom

By submitting you (1) agree to Voxel51’s Terms of Service and Privacy Statement and (2) agree to receive occasional emails.

How We Built CoTracker3: Simpler and Better Point Tracking by Pseudo-Labeling Real Videos

Nikita Karaev
Meta AI and Oxford

CoTracker3 is a state-of-the-art point tracking model that introduces significant improvements in tracking objects through video sequences. Its key innovations include:

  • Uses semi-supervised training with real videos, reducing reliance on synthetic data1
  • Generates pseudo-labels using existing tracking models as teachers1
  • Features a simplified architecture compared to previous trackers

About the Speaker

Nikita Karaev is currently doing a PhD at Meta AI and Oxford, where he’s working on dynamic reconstruction and motion estimation (CoTracker) with Andrea Vedaldi and Christian Rupprecht. Before that, he did his master’s at École Polytechnique (Paris), and undergrad in cold Siberia (Novosibirsk). He was also an early employee at two startups that got acquired by Snapchat and Farfetch.

Hands-On with Meta AI's CoTracker3: Parsing and Visualizing Point Tracking Output

Harpreet Sahota
Voxel51

In this presentation, Harpreet Sahota explores CoTracker3, a state-of-the-art point tracking model that effectively leverages real-world videos during training. He dives into the practical aspects of running inference with CoTracker3 and parsing its output into FiftyOne, a powerful open-source tool for dataset curation, analysis, and visualization. Through a hands-on demonstration, Harpreet shows how to prepare a video for inference, run the model, examine its output, and parse the model’s output into FiftyOne’s keypoint format for seamless integration and visualization within the FiftyOne app.

About the Speaker

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.