Virtual

Americas

Workshops

From Research to Reality: Building GUI Agents That Actually Work - August 29, 2025

Aug 29, 2025

9 AM Pacific

Online. Register for the Zoom!

About this event

Welcome to the Visual Agents Workshop Series, your virtual pass to learn about visual agents - how they work, how to develop them and how to fine-tune them.

Host

Part 3: Teaching Machines to See and Click - Model Finetuning

From Foundation Models to GUI Specialists

Foundation models, such as Qwen2.5-VL, demonstrate impressive visual understanding, but they require specialized training to master GUI interactions. In this final session, you'll transform a general-purpose vision-language model into a GUI specialist that can navigate interfaces with human-like precision.

We'll explore modern fine-tuning strategies specifically designed for GUI tasks, from selecting the right architecture to handling the unique challenges of coordinate prediction and multi-step reasoning. You'll implement training pipelines that can handle the diverse formats and platforms in your dataset, evaluate models on metrics that actually matter for GUI automation, and deploy your trained model in a real-world testing environment.