AI, Machine Learning & Computer Vision Meetup – Dec 12, 2024

AI, Machine Learning & Computer Vision Meetup – Dec 12, 2024

This event is now over.

Register for the next one.

Go to upcoming events
Skip to content

AI, Machine Learning and Computer Vision Meetup

Dec 12, 2024 at 10 AM Pacific

Register for the Zoom

By submitting you (1) agree to Voxel51’s Terms of Service and Privacy Statement and (2) agree to receive occasional emails.

How We Built CoTracker3: Simpler and Better Point Tracking by Pseudo-Labeling Real Videos

Nikita Karaev
Meta AI and Oxford

CoTracker3 is a state-of-the-art point tracking model that introduces significant improvements in tracking objects through video sequences. Its key innovations include:

  • Use of semi-supervised training with real videos, reducing reliance on synthetic data
  • Generates pseudo-labels using existing tracking models as teachers
  • Features a simplified architecture compared to previous trackers

About the Speaker

Nikita Karaev is currently doing a PhD at Meta AI and Oxford, where he’s working on dynamic reconstruction and motion estimation (CoTracker) with Andrea Vedaldi and Christian Rupprecht. Before that, he did his master’s at École Polytechnique (Paris), and undergrad in cold Siberia (Novosibirsk). He was also an early employee at two startups that got acquired by Snapchat and Farfetch.

Hands-On with Meta AI's CoTracker3: Parsing and Visualizing Point Tracking Output

Harpreet Sahota
Voxel51

In this presentation, Harpreet Sahota explores CoTracker3, a state-of-the-art point tracking model that effectively leverages real-world videos during training. He dives into the practical aspects of running inference with CoTracker3 and parsing its output into FiftyOne, a powerful open-source tool for dataset curation, analysis, and visualization. Through a hands-on demonstration, Harpreet shows how to prepare a video for inference, run the model, examine its output, and parse the model’s output into FiftyOne’s keypoint format for seamless integration and visualization within the FiftyOne app.

About the Speaker

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.

Streamlined Retail Product Detection with YOLOv8 and FiftyOne

Vanshika Jain
UNAR Labs

In the fast-paced retail environment, automation at checkout is increasingly essential to enhance operational efficiency and improve the customer experience. This talk will demonstrate a streamlined approach to retail product detection using the Retail Product Checkout (RPC) dataset, which includes 200 SKUs across 17 meta-categories such as puffed food, dried food, and drinks. By leveraging YOLOv8, renowned for its speed and accuracy in real-time object detection, and FiftyOne, an open-source toolset for computer vision, we can simplify data loading, training, evaluation, and visualization for effective product detection and classification. Attendees will gain insights into how these tools can be applied to optimize checkout automation.

About the Speaker

Vanshika Jain is a Data Engineer Intern at UNAR Labs, a startup focused on making information accessible for the blind. She holds a Master’s degree in Machine Learning and Computer Vision from Northeastern University and is passionate about applying AI and computer vision to real-world problems, with a focus on automation and accessibility.