The era of 1000TOPS computing power is coming

Visit the original URL

Human society has entered the era of computing power.

According to the calculations of the China Academy of Information and Communications Technology, by the end of 2021, the scale of China’s core computing power industry will exceed 1.5 trillion yuan, and the scale of related industries will exceed 8 trillion yuan. Among them, the cloud computing market scale exceeds 300 billion yuan, the Internet data center (server) market scale exceeds 150 billion yuan, and the AI ​​core industry scale exceeds 400 billion yuan.

The average growth rate of the domestic computing power industry in the past five years has exceeded 30%, and the computing power scale has exceeded 150EFlops (15,000 floating-point operations per second), ranking second in the world, and the first is the United States.

In the era of digital economy, the level of computing power has become one of the important indicators of comprehensive national strength, and high computing power chip technology is an important manifestation of the country’s core competitiveness.

Many scenarios have entered the era of computing power exceeding 1000TOPS (Tera Operations Per Second, the processor can perform one trillion operations per second (1012)).

01 High computing power exceeding 1000 TOPS

Data Center and Supercomputing

A typical scenario where the computing power exceeds 1000TOPS is the data center and supercomputing. Let’s first look at the data center’s demand for computing power. The “Three-Year Action Plan for the Development of New Data Centers (2021-2023)” issued by the Ministry of Industry and Information Technology clarifies the connotation of computing power and introduces the measurement index FLOPS to evaluate the development quality of data centers. It points out that By the end of 2023, the total computing power scale will exceed 200 EFLOPS, and the high-performance computing power will account for 10%. By 2025, the total computing power scale will exceed 300 EFLOPS.

The supercomputing center has already entered the era of E-level computing power (tens of billions of calculations per second), and is developing towards Z (thousand E)-level computing power. Exascale computing, also known as exascale computing, is the new pursuit of the world’s top supercomputing systems. To explain exascale computing in an imprecise way, the calculation performed by an exascale computer in an instant is equivalent to four years of non-stop calculations by all people on the earth every second every day.

In May 2022, the Frontier supercomputing center of the Oak Ridge National Laboratory of the U.S. Department of Defense, which topped the list of the world’s top 500 supercomputers, uses AMD’s MI250X high-computing chip (which can provide 383 TOPS computing power), reaching 1.1 EOPS double precision Floating point computing power.

artificial intelligence

The continuous development of artificial intelligence also puts forward higher requirements for the computing power of chips. The biggest challenge to computing power from the application of artificial intelligence still comes from the model training of the core data center. In recent years, the complexity of the algorithm model has shown an exponential growth trend, and it is constantly approaching the upper limit of computing power.

Take the GPT3 pre-training language model released in 2020 as an example. It has 175 billion parameters, uses a corpus of 100 billion words for training, and uses 1,000 pieces of the most advanced NVIDIA A100 GPU (graphics processing unit, 624 TOPS). 1 month.

Less than a year after the advent of GPT-3, a larger and more complex language model, the Switch Transformer, a language model with more than one trillion parameters, has been released. At present, the computing power required by artificial intelligence doubles every two months, and the supply level of new computing power infrastructure carrying AI will directly affect the iteration of AI innovation and the implementation of industrial AI applications.

AI model running has entered the trillion-level era, and the development of deep learning has gradually entered the stage of large models and big data. The model parameters and data volume have exploded, and the demand for computing power has exceeded the actual growth rate of computing power by an average of 375 times every 2 years.

Autopilot

Autonomous driving tasks require chips with high computing power higher than 1000 TOPS.

The competition for autonomous driving is actually a competition for computing power. Cars are constantly advancing from L1, L2 to L3, L4, and L5. In a sense, it is a competition of computing power. Every step up means a higher demand for computing power. The demand for computing power for advanced autonomous driving is increasing exponentially.

From 2014 to 2016, the computing power of Tesla ModelS is 0.256 TOPS. In 2017, the computing power of NIO ES8 is 2.5 TOPS. In 2019, the computing power of Tesla Model3 is 144 TOPS. In 2021, the computing power of Zhiji L7 is 1070 TOPS. TOPS.

Comprehensively considering the current status of chip computing power under the development of integrated circuit technology and the future development trend of artificial intelligence, data centers, autonomous driving and other fields, future high computing power chips require a computing power level of no less than 1000 TOPS.

The growth of market demand for computing power far exceeds the evolution speed of Moore’s Law. OpenAI’s model shows that since 2010, the computing power demand of the most complex AI model in the industry has increased by 10 billion times. At present, 80% of the solutions to computing power rely on parallel computing and increased investment, 10% rely on the progress of AI algorithms, and 10% rely on the progress of chip unit computing power.

02 The “big computing power chip” behind 1000TOPS

The pursuit of computing power by a single chip is endless. At present, people in the industry believe that “a single chip with a computing power of 100 TOPS” can be called a “big computing power chip”.

At present, there are not many companies that can launch a single chip exceeding 100TOPS, including: AMD’s MI250X high-computing chip (which can provide 383 TOPS computing power), Mobileye EyeQ Ultra single chip (computing power can reach 176TOPS) and so on.

Domestically, the Cambrian will also release two cloud AI chips in 2021, namely Siyuan 290 and Siyuan 370. Siyuan 370 is the first Cambrian AI chip that adopts chiplet (chiplet) technology. It integrates 39 billion transistors and has a maximum computing power of 256TOPS (INT8). 2 times.

In addition, Suiyuan Technology , Horizon , Hanbo Semiconductor , Xinchi Technology , Black Sesame Smart , etc. will also launch AI chips with large computing power in 2021. Force up to 320TOPS.

v2_efd5c656b9264d419aaa070f801e8456_img_

At present, only Nvidia and Qualcomm have launched SoCs with computing power exceeding 1000TOPS, and the high computing power chips launched by the two companies are mainly used in the field of autonomous driving.

Let’s first look at Nvidia. In April 2021, Nvidia has released the DRIVE Atlan chip with a computing power of 1000TOPS. By this year, Nvidia directly launched the chip Thor, which has twice the computing power of Atlan, reaching 2000TOPS, and it will be put into production in 2025, directly skipping the 1000TOPS DRIVE Atlan chip.

Followed by Qualcomm, this year also launched an integrated automotive supercomputing SoC – Snapdragon Ride Flex, including Mid, High, Premium three levels. The most advanced Ride Flex Premium SoC plus AI accelerator, its comprehensive AI computing power can reach 2000TOPS.

v2_270b17a8ff944fa38e245755b58f9209_img_

Behind the super computing power is the use of SoC on-chip integration. Heterogeneous computing improves computing parallelism and efficiency through the mixed cooperation mode of multiple computing units, and its proportion in various typical applications such as mobile Internet, artificial intelligence, and cloud computing has increased significantly, and is mainly achieved through intra-chip heterogeneity and intra-node heterogeneity. Two modes achieve the best balance between performance, power consumption and cost. The typical representative of intra-chip heterogeneity is the SoC chip. Taking Nvidia’s Thor as an example, the reason why Thor can achieve such a high computing power is mainly due to the Hopper GPU, Next-Gen GPU Ada Lovelace and Grace CPU in its overall architecture.

03 How High Computing Power Chips Evolve

In fact, the computing power of the chip is determined by the data interconnection, the computing power provided by the unit transistor (usually determined by the architecture), the transistor density and the chip area. Therefore, in order to achieve the improvement of computing power, we need to start from these aspects.

The first path of computing power evolution: the challenge of chip system architecture

Chips above 200TOPS have very high requirements for memory access capabilities and need to support higher bandwidth, which brings a significant increase in the complexity of system architecture design.

The current chip mainly adopts the von Neumann architecture, and storage and computing are physically separated. According to statistics, in the past two decades, processor performance has increased by about 55% per year, while memory performance has only increased by about 10% per year. As a result, in the long run, the unbalanced development speed has caused the current storage speed to seriously lag behind the computing speed of the processor, and a “storage wall” problem has emerged, which eventually makes it difficult for chip performance to keep up with demand.

The “Huang’s Law” proposed by Nvidia predicts that the GPU will double the performance of AI year by year, and adopt new technologies to coordinate and control the information flow through the device, minimize data transmission, and avoid the “storage wall” problem.

Nvidia has iteratively formed a domain-customized architecture integrating Tensor Core on GPGPU. The newly released H100 GPU in 2022 is based on a 4 nm process and can provide a computing power of 2000 TFLOPS (teraflops per second).

The second path of computing power evolution: the challenge of advanced technology platforms

The reduction in the size of integrated circuits can bring about an increase in the computing power per unit area. Under different processes of the same architecture, as the process node shrinks, the computing power of Nvidia GPU per unit area chip continues to increase. In recent years, Nvidia, AMD, and Apple’s high-computing chips have all been implemented using 7 and 5 nm advanced processes. Essentially, the core of the increase in computing power is the increase in the number of transistors.

As one of the founders of Intel, Gordon Moore pointed out in the initial model that the number of transistors on a single chip cannot be increased infinitely from a technical point of view or a cost point of view; therefore, the industry is working to increase transistor density At the same time, it is also trying other software and hardware methods to improve chip operation efficiency, such as: heterogeneous computing, distributed computing, and so on.

The third path of computing power evolution: the challenge of large-scale chip engineering

The size of a high-power chip is very large, and there are severe challenges in packaging, power supply and thermal management, cost control, and yield. Of course, the price of the chip is that the larger the area, the more expensive it is. If the area of ​​the chip is doubled, the price will be 3 to 5 times or even higher.

According to the change trend of the chip area in the past 40 years, it can be seen that with the continuous development of high computing power chips, the area has also continued to increase, and it is currently approaching the area limit of monolithic integration. Since the area of ​​a single chip cannot be increased infinitely, it is a natural idea to disassemble a chip into multiple chips, manufacture them separately and package them together.

Heterogeneous integration + high-speed interconnection has shaped Chiplet, a milestone in the chip industry. If chiplet (Chiplet) design technology is used, by modularizing chips with different functions and using new design, interconnection, packaging and other technologies, chips from different technologies, different processes or even different factories are used in one chip product to solve the problem. The efficiency problem at the chip manufacturing level is solved.

04 Epilogue

Macro total computing power = performance x quantity (scale) x utilization.

Computing power is composed of three parts: performance, scale, and utilization rate. They complement each other and are indispensable: some computing power chips may be able to achieve turbulent performance, but less consideration is given to the versatility and ease of use of the chip, and then chip sales If it is not high and the scale is small, then it is impossible to achieve a real increase in macro computing power.

Some computing power improvement plans focus on large-scale investment, which has a certain effect, but it is not the fundamental solution to the order of magnitude increase in future computing power demand.

At this stage, the game of great powers intensifies the restructuring of the global industrial chain and supply chain. At the same time, the development of China’s advanced integrated circuit technology is restricted, and single-point breakthroughs relying solely on advanced manufacturing processes and other technologies are costly and long-term.

Using mature manufacturing processes and advanced integration, combined with domestic leading new architectures such as CGRA and storage-computing integration, it is a feasible breakthrough path to realize wafer-level high-computing chips based on chip technology, which can take advantage of existing superior technologies , with a lower cost input, the computing power of the chip can be improved faster.

This article comes from the WeChat public account “Semiconductor Industry Vertical and Horizontal” (ID: ICViews) , author: Jiulin, 36 Krypton is authorized to publish.

media reports

Hexun Technology 36Kr Titanium Media Surging News Sohu
related events

This article is transferred from: https://readhub.cn/topic/8lGfHguxvQ4
This site is only for collection, and the copyright belongs to the original author.