What data should a vision-language model be trained on, and who gets to decide what “good data” even means? Most existing curation pipelines are limited because they are offline (they produce a static dataset from a set of predetermined filtering criteria) and concept-agnostic (they rely on model-based scores that can silently introduce new biases in what concepts the model sees). In this talk, I will discuss our new work CABS that tackles both these problems with large-scale sample-level concept annotations and flexible online batch sampling.
First, we construct DataConcept, a 128M web-crawled image–text collection annotated with fine-grained concept composition, and show how this enables Concept-Aware Batch Sampling (CABS)—a simple online method that constructs training batches on-the-fly to match target concept distributions. We develop two variants, CABS-DM for maximizing concept coverage and CABS-FM for prioritizing high object multiplicity, and demonstrate consistent gains for CLIP/SigLIP-style models across 28 benchmarks. Finally, I’ll show that these improvements translate into strong vision encoders for training generative multimodal models, including autoregressive systems like LLaVA, where the encoder quality materially affects downstream capability.
Do Your Agents Actually Work? Measuring Skills and MCP in Practice
This talk shows how to evaluate agent performance in real scenarios using FiftyOne Skills and MCP. We will cover practical ways to design scenarios, run agents, and measure how they use tools, including signals like latency, token usage, and output quality. The goal is to move beyond final outputs and better understand agent behavior, helping teams build more reliable and measurable agent systems.
How token costs and volume quietly shape AI product strategy.
Teams building AI applications often focus on prompts, frameworks, and model capabilities. In practice, many real constraints appear deeper in the infrastructure layer: inference cost volatility, routing across models, latency tradeoffs, and reliability under production workloads. This talk shares patterns emerging from teams building multi-model and agent-driven systems, highlighting architectural decisions engineers are making as they move from experimentation to production AI.
The last mile of OCR [in 2026]
The LAST MILE of OCR/LLM-based Document AI - Hands on with Agentic Document Extraction
OCR is nailing it in benchmarks but the real work lies in The Long Tail of IDP. Large tables, old scans, mixed-language docs, handwriting, complex layouts is where most enterprise and real-world document work happens. This is where the best benchmarked models still struggle. In this workshop, we will go through how LandingAI’s Agentic Document Extraction (ADE) goes beyond OCR and Parsing to enable real-world document AI use cases and workloads.
Agenda: The pillars of Agentic Document Extraction Building document processing pipelines with ADE API/SDK Using Skills to have Coding Agents build for you How ADE gives LLMs the last mile - Analysing LLM performance on large table, scanned docs, complex layouts and enabling them with the structured output from ADE