Document Visual AI with FiftyOne—When a Pixel is Worth a Thousand Tokens - November 14, 2025
This event has ended, but you can still catch up! Watch the on-demand recordings and register for our future events.
Nov 14, 2025
9:00-10:30 AM Pacific
Online. Register for the Zoom!
About this event
In document understanding, a pixel is worth a thousand tokens. While traditional text-extraction pipelines tokenize and process documents sequentially, modern visual AI approaches can understand document structure, layout, and content directly from images—making them more efficient, accurate, and robust to diverse document formats.
Host
This hands-on workshop introduces you to document visual AI workflows using FiftyOne, the leading open-source toolkit for computer vision datasets. You'll learn how to:
Load and organize document datasets in FiftyOne for visual exploration and analysis
Compute visual embeddings using state-of-the-art document retrieval models to enable semantic search and similarity analysis
Leverage FiftyOne workflows including similarity search, clustering, and quality assessment to gain insights from your document collections
Deploy modern vision-language models for OCR and document understanding tasks that go beyond simple text extraction
Evaluate and compare different OCR models to select the best approach for your specific use case
Whether you're working with invoices, receipts, forms, scientific papers, or mixed document types, this workshop will equip you with practical skills to build robust document AI pipelines that harness the power of visual understanding. Walk away with reproducible notebooks and best practices for tackling real-world document intelligence challenges.
Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in VLMs, Visual Agents, Document AI, and Physical AI.