We just wrapped up the December ‘24 AI, Machine Learning and Computer Vision Meetup, and if you missed it or want to revisit it, here’s a recap! In this blog post you’ll find the playback recordings, highlights from the presentations and Q&A, as well as the upcoming Meetup schedule so that you can join us at a future event.
How We Built CoTracker3: Simpler and Better Point Tracking by Pseudo-Labeling Real Videos
CoTracker3 is a state-of-the-art point tracking model that introduces significant improvements in tracking objects through video sequences. Its key innovations include:
- Use of semi-supervised training with real videos, reducing reliance on synthetic data
- Generates pseudo-labels using existing tracking models as teachers
- Features a simplified architecture compared to previous trackers
Speaker: Nikita Karaev is currently doing a PhD at Meta AI and Oxford, where he’s working on dynamic reconstruction and motion estimation (CoTracker) with Andrea Vedaldi and Christian Rupprecht. Before that, he did his master’s at École Polytechnique (Paris), and undergrad in cold Siberia (Novosibirsk). He was also an early employee at two startups that got acquired by Snapchat and Farfetch.
Q&A
- Is processing a whole video in one go not computationally expensive?
- Can you explain more about the 4D correlation?
- Do you think leveraging pre-trained world models or models explicitly trained on/sensitive to laws of physics/ how objects in 3D interact – can this be useful for this kind of temporal tracking? Would it be useful for OOD cases?
- What are the evaluation metrics that are mainly tracked?
- How can CoTracker’s joint tracking technology be leveraged to enhance identity verification and access control in cybersecurity frameworks, and what are the potential risks associated with spoofing or compromising such systems?
Resource Links
Hands-On with Meta AI’s CoTracker3: Parsing and Visualizing Point Tracking Output
In this presentation, Harpreet Sahota explores CoTracker3, a state-of-the-art point tracking model that effectively leverages real-world videos during training. He dives into the practical aspects of running inference with CoTracker3 and parsing its output into FiftyOne, a powerful open-source tool for dataset curation, analysis, and visualization. Through a hands-on demonstration, Harpreet shows how to prepare a video for inference, run the model, examine its output, and parse the model’s output into FiftyOne’s keypoint format for seamless integration and visualization within the FiftyOne app.
Speaker: Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.
Q&A
- For the Cotracker models – are there model compression/quantization techniques you tried and or can recommend?
Resource Links
Streamlined Retail Product Detection with YOLOv8 and FiftyOne
In the fast-paced retail environment, automation at checkout is increasingly essential to enhance operational efficiency and improve the customer experience. This talk will demonstrate a streamlined approach to retail product detection using the Retail Product Checkout (RPC) dataset, which includes 200 SKUs across 17 meta-categories such as puffed food, dried food, and drinks. By leveraging YOLOv8, renowned for its speed and accuracy in real-time object detection, and FiftyOne, an open-source toolset for computer vision, we can simplify data loading, training, evaluation, and visualization for effective product detection and classification. Attendees will gain insights into how these tools can be applied to optimize checkout automation.
Speaker: Vanshika Jain is a Data Engineer Intern at UNAR Labs, a startup focused on making information accessible for the blind. She holds a Master’s degree in Machine Learning and Computer Vision from Northeastern University and is passionate about applying AI and computer vision to real-world problems, with a focus on automation and accessibility.
Q&A
- In your retail use case, did you find certain types of objects get confused with each other more often than other object pairs?
- Is the code you used for fine tuning published? Or can you point to a resource you can recommend?
Resource Links
Join the AI, Machine Learning and Computer Vision Meetup!
The goal of the Meetups is to bring together communities of data scientists, machine learning engineers, and open source enthusiasts who want to share and expand their knowledge of AI and complementary technologies.
Join one of the 12 Meetup locations closest to your timezone.
- Athens
- Austin
- Bangalore
- Boston
- Chicago
- London
- New York
- Peninsula
- San Francisco
- Seattle
- Silicon Valley
- Toronto
What’s Next?
Up next on Jan 29, 2024 at 9:00 AM PT / 12:00 PM ET, we have three great speakers lined up!
- Is AI Creating a Whole New Earth-Aware Geospatial Stack? Promises and Challenges- Dr. Bruno Sanchez-Andrade Nuno, Clay – AI for Earth
- Evaluating the Satlas and Clay Remote Sensing Foundational Models- Steve Pousty, Voxel51
- Earth Monitoring for Everyone with Earth Index– Mikel Maron, The Earth Genome
Register for the Zoom here. You can find a complete schedule of upcoming Meetups on the Voxel51 Events page.
Get Involved!
There are a lot of ways to get involved in the Computer Vision Meetups. Reach out if you identify with any of these:
- You’d like to speak at an upcoming Meetup
- You have a physical meeting space in one of the Meetup locations and would like to make it available for a Meetup
- You’d like to co-organize a Meetup
- You’d like to co-sponsor a Meetup
Reach out to Meetup co-organizer Jimmy Guerrero on Meetup.com or ping me over LinkedIn to discuss how to get you plugged in.
—
These Meetups are sponsored by Voxel51, the company behind the open source FiftyOne computer vision toolset. FiftyOne enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster. It’s easy to get started, in just a few minutes.