Huawei boosts AI response speed with new inference acceleration solution
Translated from Korean, summarized and contextualized by DistantNews.
At a glance
- Huawei has verified its AI Inference Acceleration Solution in a commercial network, marking a first for China's telecom industry.
- The solution, using Huawei's storage and AI computing products, significantly improves response times for generative AI models.
- This development addresses the growing challenge of slower AI responses with longer text inputs, crucial for AI agents and large language models.
Huawei has successfully verified its AI Inference Acceleration Solution within a commercial network environment, a significant milestone for China's telecommunications sector. This solution is designed to enhance the response speed of generative AI, addressing a key challenge as AI services become more complex and handle longer contexts.
The verification, conducted in collaboration with China Mobile Hubei at the MWC Shanghai 2026, utilized Huawei's OceanStor A800 storage, Ascend A3 SuperPoD, and Unified Cache Manager (UCM). The company expects this technology to enable telecom operators to manage AI computing services more efficiently. As generative AI evolves to process extensive context for tasks like AI agents, code generation, and multi-turn conversations, the need for faster inference performance has become critical.
During tests using MiniMax M2.5 and GLM-5.1 models on China Mobile Hubei's commercial network, Huawei reported substantial improvements. The time to first token (TTFT) was reduced by up to 93%, and the tokens per second (TPS) throughput increased by as much as 372%. Notably, the performance gains were more pronounced when processing longer input data.
Major telecom operators are launching token-based AI services one after another, and the large-scale adoption of AI agents is entering a new phase. Token usage is expected to increase exponentially.
Michael Chu, Global President of Huawei's Data Storage Marketing & Solution Sales, stated that the large-scale adoption of AI agents is entering a new phase with major telecom operators launching token-based AI services. He anticipates a geometric increase in token usage. Huawei's solution aims to shorten TTFT and reduce token processing costs, supporting telecom companies in building more efficient and environmentally friendly AI computing infrastructure.
The industry is witnessing intensified competition in inference infrastructure as AI agents and enterprise-level generative AI services expand. The inherent increase in computational costs and power consumption with user growth and longer contexts drives global investment in inference optimization and AI infrastructure upgrades to boost response speed and lower operational expenses.
The AI inference acceleration solution significantly shortens the time to first token generation and contributes to reducing token processing costs, helping telecom operators build more efficient and eco-friendly AI computing infrastructure.
Originally published by Dong-A Ilbo in Korean. Translated, summarized, and contextualized by our editorial team with added local perspective. Read our editorial standards.