Generative AI has made remarkable strides in producing photorealistic images, videos, and multimodal content. Yet, aligning these generations with human users, while ensuring spatial coherence, logical consistency, and deployment scalability, remains a major challenge, especially for real-world media platforms. In this talk, I will present our recent progress in enhancing the reasoning and control capabilities of image/video generative models, structured around four key pillars:
1. Efficiency & Scalability — with systems like ECLIPSE and FlowChef;
2. Control & Editing — including Lambda-ECLIPSE and RefEdit;
3. Reliability & Security — through efforts such as SPRIGHT, REVISION, R.A.C.E., and WOUAF;
4. Evaluation & Metrics — via benchmarks/metrics like VISOR, ConceptBed, TextInVision, and VOILA.
Together, these contributions outline a cohesive vision for building controllable, robust, and scalable generative systems, core to advancing personalization, understanding, and automated content workflows in media streaming and beyond.