Register for the Zoom
Virtual
Americas
Workshops
Getting Started with Perceptron AI and FiftyOne for Video Understanding - May 27, 2026
May 27, 2026
9:00 AM - 11:00 AM PST
Online. Register for Zoom!
Speakers
About this event
Join us for a hands-on virtual session on May 27 exploring video-native multimodal AI and how to integrate cutting-edge video understanding models into your computer vision workflows. Akshat Shrivastava from Perceptron will introduce their latest video-native multimodal model that matches frontier models at a fraction of inference cost, followed by Harpreet Sahota demonstrating how to get started with Perceptron AI inside FiftyOne.
Schedule
Video-Native Multimodal Models for Video and Image Understanding
In this 20-minute talk, Akshat will introduce Perceptron’s latest release, a video-native multimodal model that matches or exceeds frontier models from Google and Alibaba on video and image understanding at a fraction of their inference cost. He’ll walk through the capabilities that move the needle for real video workloads: temporal grounding to clip precise events from long streams, egocentric reasoning for first-person and wearable contexts, and structured “thinking traces” that reason over motion and physical space. He’ll also cover the image-side advances production perception teams care about: reliable pointing, point-by-example one-shot visual search, dense counting, dial/gauge/clock reading, and structured document extraction.
Getting Started with Perceptron AI in FiftyOne
In the second half of the session, Harpreet Sahota will walk through how to get started using Perceptron’s video-native multimodal model within FiftyOne for real-world video understanding workflows. He’ll demonstrate how to connect to the API, explore multimodal outputs inside FiftyOne, and build practical workflows for tasks like temporal event analysis, visual search, and video dataset inspection. Attendees will leave with a hands-on understanding of how to integrate state-of-the-art video perception models into their existing computer vision pipelines.