SenseTime Launches Next-Generation Lightweight Multimodal Agent Model; Token Consumption Drops 60%

Edited by Betty From Gasgoo

Gasgoo Munich- SenseTime has officially launched its next-generation lightweight multimodal agent model: SenseNova 6.7 Flash-Lite.

640.png

Image Source: SenseTime

Designed for "real-world workflows," SenseNova 6.7 Flash-Lite uses a native multimodal architecture to process information much like a human. It can directly interpret complex web layouts, document structures, and financial charts. By integrating "seeing, thinking, and doing," the model significantly boosts success rates for long-chain, complex tasks such as data analysis, in-depth research, and PPT generation.

By eliminating the intermediate vision-to-text layer, the model delivers a leap in agent capabilities with a smaller parameter footprint. It has secured multiple top-tier (SOTA) results in authoritative agent benchmarks within its class.

The model also drastically reduces token consumption during inference. In scenarios like information search, token usage drops by 60% compared to text-only agents. It delivers millisecond-level feedback, making it better suited for high-frequency production environments.

SenseTime has now packaged the core capabilities of its SenseNova series into "SenseNova-Skills."

These skills cover high-frequency office scenarios—including infographic generation, PPT creation, Excel data analysis, and in-depth research—and offer native support for agent frameworks like OpenClaw and Hermes Agent. Users can deploy them individually or combine them into end-to-end, complex workflows.

Gasgoo not only offers timely news and profound insight about China auto industry, but also help with business connection and expansion for suppliers and purchasers via multiple channels and methods. Buyer service: buyer-support@gasgoo.com Seller Service: seller-support@gasgoo.com

All Rights Reserved. Do not reproduce, copy and use the editorial content without permission. Contact us: autonews@gasgoo.com