In this talk, we introduce UISurf, an open-source multimodal agentic UI automation platform in which agents can perceive, reason, and collaborate across browser and desktop environments to complete end-to-end tasks that require interaction with multiple user interfaces.
UISurf comprises three main components: uisurf-agent, the runtime for UI automation agents; uisurf-admin, the session orchestration and management service; and uisurf-app, the full-stack user application.
Its multi-agent architecture includes a planning_agent that transforms natural-language requests into structured execution plans, specialized Browser and Desktop Agents for environment-specific interaction, an automation_agent that coordinates execution and inter-agent handoff through Agent-to-Agent (A2A) communication, and a summarization_agent that produces the final task summary for the user.
UISurf supports both autonomous execution and human-in-the-loop supervision, offering a practical and extensible framework for studying and deploying cross-environment UI automation.