Gasgoo Munich- On May 13, Xiaomi officially released Xiaomi OneVL, a one-step latent space language-vision reasoning framework designed for autonomous driving. For the first time, this framework integrates multiple technical pathways—including VLA models, world models, and latent space reasoning—into a single system. This approach significantly boosts inference speed and precision while maintaining robust reasoning capabilities.

Image Credit: @XiaomiTech
According to the company, Xiaomi OneVL employs a "language reasoning plus visual future prediction" dual supervision mechanism. This fuses interpretability with the world model's ability to forecast the future, embedding both directly into the latent space reasoning process.
The core concept is straightforward: the information compressed for autonomous driving shouldn't be limited to language-level reasoning alone. Instead, it requires a holistic understanding of how the visual world will change. Driving decisions rely heavily on spatiotemporal causal relationships—such as vehicle motion, road geometry, and the evolution of obstacles. Relying solely on language compression risks losing critical structural information, whereas compressing predictions of future visual scenes more effectively preserves the core elements that determine driving outcomes.
Based on this approach, Xiaomi has introduced three key technologies. The goal is to enable the model to "think" internally in its own language, learn to predict future visual scenes, and compress the entire reasoning chain into a single step. In multiple mainstream reasoning and planning benchmarks, Xiaomi OneVL has set new performance records for existing latent space reasoning methods. It surpasses the accuracy of explicit Chain-of-Thought (CoT) methods while matching the speed of "answer-only" prediction modes.
Xiaomi Group CEO Lei Jun announced that the model and its code will be fully open-sourced. The company invites global developers and researchers to join the effort, aiming to drive the further evolution of large models for autonomous driving.









