Getting Started with Perceptron AI and FiftyOne for Video Understanding - May 27, 2026
This event has ended, but you can still catch up! Watch the on-demand recordings and register for our future events.
May 27, 2026
9:00 AM - 11:00 AM PST
Online. Register for Zoom!
Speakers
About this event
Join us for a hands-on virtual session on May 27 exploring video-native multimodal AI and how to integrate cutting-edge video understanding models into your computer vision workflows. Akshat Shrivastava from Perceptron will introduce their latest video-native multimodal model that matches frontier models at a fraction of inference cost, followed by Harpreet Sahota demonstrating how to get started with Perceptron AI inside FiftyOne.
Video-Native Multimodal Models for Video and Image Understanding
In this 20-minute talk, Akshat will introduce Perceptron’s latest release, a video-native multimodal model that matches or exceeds frontier models from Google and Alibaba on video and image understanding at a fraction of their inference cost. He’ll walk through the capabilities that move the needle for real video workloads: temporal grounding to clip precise events from long streams, egocentric reasoning for first-person and wearable contexts, and structured “thinking traces” that reason over motion and physical space. He’ll also cover the image-side advances production perception teams care about: reliable pointing, point-by-example one-shot visual search, dense counting, dial/gauge/clock reading, and structured document extraction.
In the second half of the session, Harpreet Sahota will walk through how to get started using Perceptron’s video-native multimodal model within FiftyOne for real-world video understanding workflows. He’ll demonstrate how to connect to the API, explore multimodal outputs inside FiftyOne, and build practical workflows for tasks like temporal event analysis, visual search, and video dataset inspection. Attendees will leave with a hands-on understanding of how to integrate state-of-the-art video perception models into their existing computer vision pipelines.