How to solve the problem of cloud AI chip landing?

What is the first thing that comes to your mind when it comes to the implementation of AI?

Some people think about the utilization rate of AI chips and the scheduling of the underlying hardware; some people think about the computing power benefits of AI chips; others think about computing power services.

The implementation of AI can provide underlying support for the construction of smart cities, more accurate weather forecasting, and building a more secure network environment.

However, the implementation of AI still faces many challenges. For example, how can AI be used? How can we gradually realize the value of AI? How to develop a computing power economy based on AI?

This step-by-step problem tests all those who provide AI technology and who want to use AI technology. For the providers of the lowest-level AI chips, one of the most thorny issues right now is software.

Zhang Yalin, founder and COO of Suiyuan Technology, said at the 2022 World Artificial Intelligence Conference: “According to the past implementation practice, we found that due to the complex software operation and maintenance of AI data centers, it is generally difficult to select solutions, and the compatibility of various manufacturers’ products is unknown, etc. Pain points, and the data center deployment delivery cycle is long, the communication cost is high, and the project management cycle is long. ”

Software issues, especially the upper-layer software and ecology based on high-performance AI reasoning and training chips in the cloud, limit the development of many AI chip innovators.

Zhao Lidong, founder and CEO of Suiyuan Technology, believes that “ecological monopoly is the biggest challenge we face at present, and the reason for ecological monopoly is tightly coupled software and hardware. Therefore, we must develop our own hardware and software ecology.”

In order to solve the challenges of the current cloud AI chip implementation, different companies will break through the challenges of software and ecology from different dimensions.

On September 3, 2022, Suiyuan Technology gave the answer to this challenge at the cloud computing power industry application forum “Make the best use of it and define the new practice of AI computing power center” – CloudBlazer (CloudBlazer POD).

Yunsui Intelligent Computer is a high-performance AI acceleration cluster for large-scale and intensive artificial intelligence computing power application scenarios. It has one-stop pre-integrated artificial intelligence acceleration hardware, integrated development and management platform, and supporting artificial intelligence application software and services. It is suitable for digital government, scientific research institutes, science and technology innovation platforms, etc.

To put it simply, a problem-solving idea given by Suiyuan Technology in solving the implementation of high-performance cloud AI chips is “out of the box”.

Why out of the box? In terms of delivery methods, Yunsui Smart Computer provides a turnkey solution that includes procurement, installation, and operation and maintenance.

It can be delivered in this way because Yunsui Smart Computer adopts an integrated design.

The computing power of the hardware is based on the self-developed AI high-performance chips that Suiyuan Technology has released. In a typical configuration, Yunsui Smart Computer can reach 8PFLOPS of TF32 floating-point computing power per unit, and supports horizontal expansion on demand, supports clusters of thousands of cards, and can achieve top-level supercomputing E-level computing power.

At the same time, Yunsui Smart Computer also integrates the partner’s CPU to provide sufficient computing power. But in addition to computing power, the core element of computing clusters, network and storage are very critical.

According to Zhang Yalin, “Cloud Suizhi computer represents the overall design of computing, network, and storage formed by Suiyuan Technology through a number of large-scale engineering practices: aiming at global optimization, based on the separation of computing, storage, and management networks, and full interconnection. With a non-blocking network architecture, combined with an efficient multi-level storage method, and supported by the heterogeneous computing power of the ‘Suisi’ AI chip and CPU, Yunsui Smart Computer can provide excellent AI performance.”

Leifeng.com learned that the first and second generation of Suiyuan Technology’s “Suisi” chips have been actually used in large-scale AI cluster projects, with a scale of 1,000 cals. The scenarios include fusion media generation, urban intelligent perception, etc. .

Of course, when it comes to computing clusters, we have to pay attention to the overall energy efficiency (PUE) of the data center, especially in terms of the dual carbon goal and the general trend of green environmental protection, as well as the policy requirements of the East and West. It is reported that Yunsui Smart Computer adopts integrated cold plate liquid cooling technology to achieve liquid cooling and heat dissipation of 8 high-performance artificial intelligence chips on a single node, and the PUE can be reduced to 1.1 and below.

As mentioned earlier, a huge challenge for AI landing is software. However, software is a very broad concept, which requires not only compilers, libraries, etc. that can improve the utilization of AI chips, but also management software for computing power platforms.

Along with the newly launched CloudBlazer computer, Suiyuan Technology provides the CloudBlazer Station, which includes a heterogeneous computing power scheduling platform at the infrastructure layer, an intelligent operation and maintenance platform, and a computing software stack SDK. The intelligent algorithm management platform of the algorithm service layer and the integrated platform for training and pushing.

At the same time, in the face of the trend of huge models with extremely large parameters, Yunsuizhi computer can support efficient and parallel training of models with over 100 billion parameters. The LARE2.0 multi-core interconnection technology provides an interconnection bandwidth of nearly 1TB/s, and the cross-node interconnection capability is as high as 600Gb/s, enabling high-speed interconnection of large-scale clusters at the kilocalorie level.

The out-of-the-box computing cluster can indeed lower the threshold for users to a certain extent, but the computing cluster is a complex system after all, and to what extent can it ultimately promote the implementation of high-performance AI computing, more implementation projects are needed prove.

This article is reprinted from: https://www.leiphone.com/category/chips/qpnbAZ2ikunHPcSk.html
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment Cancel Reply