-+ 0.00%
-+ 0.00%
-+ 0.00%

CITIC Securities: Optimistic about the entire supernode server process, it is recommended to focus on companies related to the industry chain

Zhitongcaijing·12/19/2025 00:57:01
Listen to the news

The Zhitong Finance App learned that CITIC Securities released a research report saying that the supernode solution is expected to grow rapidly. As the basic computing unit for future underlying AI infrastructure, the supernode scale-up domain has advantages such as efficient communication bandwidth and native memory semantics, and is naturally suited to current mainstream MoE architecture model calculations. Supernode's “decoupling” at the system level has increased the overall system value, and the design itself also faces more challenges. Many challenges such as multi-chip power consumption, heat dissipation, and cabinet reliability need to be solved urgently. At the same time, it is also necessary to take into account the difficulties faced in system operation. CITIC Securities believes that Supernode is expected to increase the value of the whole machine through higher technology addition. CITIC Securities is optimistic about the future development of the entire supernode server chain, and suggests focusing on companies related to the industry chain.

CITIC Securities's main views are as follows:

The MoE architecture model put forward new requirements for hardware, and the Scale-up supernode came into being.

In the context of the development of scaling laws, mainstream AI models generally use MoE (hybrid expert) architectures in order to pursue larger parameter scale and higher operating efficiency. Thanks to the unique structure of the expert network, it naturally adapts to the computational model of experts in parallel — although this method can effectively optimize the bottlenecks of computation and retrieval, it also introduces new communication problems, and supernodes based on scale-up networks have emerged as a result. Compared with traditional eight-card servers, supernodes face more complex systemic challenges: first, system cooling pressure caused by the collaborative operation of a large number of chips; second, stability problems caused by multi-chip optical and copper hybrid interconnection schemes; and third, potential reliability hazards under long-term operation of multiple components. Such problems often require deep collaboration between server manufacturers and upstream vendors to explore optimal global solutions, which also significantly increases the voice of the whole machine chain.

Hundreds of supernodes at home and abroad competed, and domestic supernodes achieved transcendence in some technical fields.

Overseas supernodes use Nvidia NVL72 as the main solution. In addition, Google Ironwood Rack uses Google's self-developed TPUV7 chip, which supports up to 9216 chip cluster expansion. Recently, domestic supernode solutions such as Huawei CloudMatrix384, Alibaba Panjiu, and Shuguang ScaleX640 have all been unveiled. We believe that the present is in the early stages of development of various supernode solutions. As the basic unit of the underlying AI infrastructure in the future, supernodes will gradually converge from the technical solutions contested by hundreds of companies to a limited direction.

In terms of computing power density, there is currently no clear conclusion about the scale up scale. Larger scale up domains are expected to bring performance benefits in model training and inference, but combining factors such as cost and reliability, this topic still needs to rely on technological development to provide answers.

In terms of network topology, the current fat tree architecture and 3D-Torus topologies each have advantages and disadvantages. We believe that the fat tree structure may occupy a higher market share in the short term from the perspective of generality, and large manufacturers with software and hardware research capabilities are expected to try out the convenience brought by solutions such as 3D-Torus.

In terms of physical connectivity, we believe that orthogonal without a backplane has advantages in terms of simplicity of connection, compact cabinet, etc., and may become the mainstream technical solution for future supernodes.

In terms of heat dissipation, as the computing power density of a single cabinet gradually increases, liquid cooling solutions with PUE closer to 1 may usher in greater development opportunities. If solutions such as phase change immersion liquid cooling can solve problems such as their stability, they may be applied on a larger scale.

The “decoupling” of supernodes has increased system value, and the added value of technology has been further demonstrated.

In the past, AI servers, which were mainly in the eight-card format, had a clear division of labor in the industrial chain and mature and stable processes in each link. The server manufacturer mainly undertook the assembly and integration of standardized components, which could efficiently complete product delivery. The technical threshold was relatively concentrated at the level of a single component. However, the technical complexity of supernode servers has achieved a qualitative leap: power consumption control brought about by multi-chip collaboration, cooling problems under high-density integration, and long-term reliability guarantee at the entire cabinet level are all unprecedented systemic challenges. As a result, server manufacturers are no longer simple “assemblers”, but rather “system integrators” at the core of the AI computing power industry — supernodes are essentially integrated computing systems. From the beginning of design, it is necessary to thoroughly consider the coupling relationships of multiple components such as chips, heat dissipation, and interconnections, and solve global problems collaboratively through cross-link technology. This requirement for systematic and integrated design and integration has greatly raised the technical threshold of supernode servers, further strengthened the voice of the whole machine chain, and became a core hub for grasping technical direction and system performance. We believe that the added value of technology is expected to gradually become apparent.

Risk factors:

The risk that computing power chips disrupt the supply chain; the risk of insufficient supply of chip production capacity; the risk that the capital expenditure of major Internet companies falls short of expectations; the risk that related industrial policies fall short of expectations; the risk that AI applications will not develop as expected; the risk that chip technology iteration falls short of expectations; and the risk of increased competition among domestic GPU manufacturers.

Investment Strategy:

Supernode technology is on the rise, and the MoE architecture is expected to become the mainstream architecture for large models. The special nature of its architecture puts forward new adaptability requirements for hardware development. Scale-up supernodes are expected to bring better solutions through efficient network communication and native memory semantics. We expect supernodes to become the underlying computing unit for future AI infrastructure. At present, there are hundreds of competing supernode solutions at home and abroad. Although there are differences in network topologies, communication protocols, etc., we believe that there is a high degree of development certainty in terms of increased computing power density, improved cooling capacity, stability and reliability, and related technologies have brought new requirements to the manufacturing of complete servers, and server manufacturers with customized development capabilities and supply chain management capabilities are expected to get greater development opportunities. It is recommended to focus on companies related to the industrial chain.