Register for the event
Virtual
Americas
CV Meetups
Text industry
Visual Document AI: Because a Pixel is Worth a Thousand Tokens - November 6, 2025
Nov 6, 2025
9 - 11 AM Pacific
Online. Register for the Zoom!
Speakers
About this event
Join us for a virtual event to hear talks from experts on the latest developments at the intersection of Visual Document AI.
Schedule
Document AI: A Review of the Latest Models, Tasks and Tools
In this talk, go through everything document AI: trends, models, tasks, tools! By the end of this talk you will be able to get to building apps based on document models
Run Document VLMs in Voxel51 with the VLM Run Plugin — PDF to JSON in Seconds
The new VLM Run Plugin for Voxel51 enables seamless execution of document vision-language models directly within the Voxel51 environment. This integration transforms complex document workflows — from PDFs and scanned forms to reports — into structured JSON outputs in seconds. By treating documents as images, our approach remains general, scalable, and compatible with any visual model architecture. The plugin connects visual data curation with model inference, empowering teams to run, visualize, and evaluate document understanding models effortlessly. Document AI is now faster, reproducible, and natively integrated into your Voxel51 workflows.
CommonForms: Automatically Making PDFs Fillable
Converting static PDFs into fillable forms remains a surprisingly difficult task, even with the best commercial tools available today. We show that with careful dataset curation and model tuning, it is possible to train high-quality form field detectors for under $500. As part of this effort, we introduce CommonForms, a large-scale dataset of nearly half a million curated form images. We also release a family of highly accurate form field detectors, FFDNet-S and FFDNet-L.
Visual Document Retrieval: How to Cluster, Search and Uncover Biases in Document Image Datasets Using Embeddings
In this talk you'll learn about the task of visual document retrieval, the models which are widely used by the community, and see them in action through the FiftyOne App where you'll learn how to use these models to identify groups and clusters of documents, find unique documents, uncover biases in your visual document dataset, and search over your document corpus using natural language.