Gasgoo Munich- SenseTime has officially launched its next-generation lightweight multimodal agent model: SenseNova 6.7 Flash-Lite.

Image Source: SenseTime
Designed for "real-world workflows," SenseNova 6.7 Flash-Lite uses a native multimodal architecture to process information much like a human. It can directly interpret complex web layouts, document structures, and financial charts. By integrating "seeing, thinking, and doing," the model significantly boosts success rates for long-chain, complex tasks such as data analysis, in-depth research, and PPT generation.
By eliminating the intermediate vision-to-text layer, the model delivers a leap in agent capabilities with a smaller parameter footprint. It has secured multiple top-tier (SOTA) results in authoritative agent benchmarks within its class.
The model also drastically reduces token consumption during inference. In scenarios like information search, token usage drops by 60% compared to text-only agents. It delivers millisecond-level feedback, making it better suited for high-frequency production environments.
SenseTime has now packaged the core capabilities of its SenseNova series into "SenseNova-Skills."
These skills cover high-frequency office scenarios—including infographic generation, PPT creation, Excel data analysis, and in-depth research—and offer native support for agent frameworks like OpenClaw and Hermes Agent. Users can deploy them individually or combine them into end-to-end, complex workflows.








