How can the fourth-generation Xeon CPU stabilize the throne of the data center without fighting for the number of cores?

In the world of instruction set architecture, although X86, Arm, and RISC-V have long been independent in their fields of expertise, the competition in the field of data centers has become increasingly obvious in recent years. The Arm camp has launched a fierce attack on the server market in an attempt to compete with x86 and RISC-V. The server-level RISC-V CPU that Arm is competing with will also be unveiled at the end of 2022 and is expected to ship in mid-2023.

With intensified internal competition and external turmoil, Intel’s data center business has declined frequently in the past year, and the market performance has not been as expected. More competitive servers are urgently needed to fight the fire.

On January 11, Intel officially released the fourth-generation Xeon Scalable Processor (code-named “Sapphire Rapids”), and launched the Intel Xeon CPU Max series (code-named “Sapphire Rapids HBM”) and the Intel Data Center GPU Max series (code-named “Sapphire Rapids”) at the same time. “Ponte Vecchio”), what are the performance advantages of this family of products? Can it help Intel regain the number one throne in data center processors in the future?

Newly added seven computing power artifacts, the core is not the only solution

“Since Intel launched the first Xeon Scalable processor in 2017, Intel has delivered more than 85 million Xeon Scalable processors to customers around the world, supporting data centers around the world. Among them, in the past two years , The third-generation Intel Xeon Scalable processors have shipped a total of 15 million units worldwide.” Zhuang Binghan, vice president of Intel Marketing Group, general manager of data center sales in China and general manager of carrier sales in China, told Intel at the press conference Xeon processor past performance summed up.

Intel’s fourth-generation Xeon Scalable processors are manufactured using the Intel 7 process technology and have a new chip architecture that supports up to 60 cores per socket, and 1, 2, 4 or 8 sockets per system, each Each slot has 80 PCIe Gen5 lanes, paired with new technologies such as DDR5 memory and CXL 1.1 to support high bandwidth and additional accelerator efficiency.

It is worth noting that, compared with the previous generation, in addition to the continuous increase in the number of cores, the fourth-generation Intel Xeon Scalable processor has new built-in accelerators, involving artificial intelligence, scientific computing, security, network, data analysis, storage, etc. field, the performance is 1.53 times higher than the previous generation on average.

Intel believes that from the requirements of industry applications to the limitations of the real physical world, only relying on the enhancement of core frequency and number of cores will not be able to meet its pursuit of higher CPU performance under real workloads, so it introduces a new solution for actual workloads. The design concept of optimized acceleration adopts a system-level design method, and a dedicated workload accelerator is built into the CPU chip architecture to improve performance and efficiency.

At the press conference, Zhuang Binghan summarized the accelerators built into the processor into seven computing power artifacts:

Intel Advanced Matrix Extensions (Intel AMX)

Can dramatically improve the performance of deep learning workloads such as recommender systems, natural language processing, image recognition, media processing and delivery, and media analytics. Compared with the previous generation (FP32), PyTorch built-in Intel Advanced Matrix Extensions (Intel AMX) (BF16) has improved real-time AI inference and training performance by 10 times. Combined with general-purpose CPU computing units, 4th Gen Intel Xeon Scalable processors can run any AI workload end-to-end.

Intel Dynamic Load Balancer (Intel DLB)

Supports efficient distribution of network workload among multiple CPU cores and threads to realize distributed processing, and dynamically redistributes data load to each CPU core when the load is unbalanced to achieve dynamic load balancing. It is also possible to adjust the order of network packets processed simultaneously on the CPU cores, enabling dynamic network processing reordering to achieve higher overall system performance. Up to 2x the capacity of vRAN workloads in the same power envelope compared to previous generation processors.

Intel Data Streaming Accelerator (Intel DSA)

Accelerators added to help users achieve faster data movement in storage, network and data analysis, help to speed up data movement between CPU, memory, cache and storage and network devices, release CPU performance and reduce latency, and improve The user’s utilization of the CPU core can increase the performance by 1.7 times.

Intel In-Memory Analytics Accelerator (Intel IAA)

For database and analysis workloads, it can improve memory query throughput and reduce the memory usage of in-memory database and big data analysis workloads. Compared with the previous generation, the Intel IAA accelerator can improve the performance of RocksDB by 3 times.

Intel Data Center and Compression Acceleration Technology (Intel QAT)

Encryption and compression can be accelerated. Intel QAT can significantly improve CPU efficiency and application throughput while reducing data footprint and energy consumption, enabling enterprises to strengthen encryption while maintaining performance.

Intel Security Engine

Including Intel Software Guard Extensions (Intel SGX), Intel Trust Domain Extension (Intel TDX), Intel Cryptographic Operation Hardware Acceleration, Intel Memory Fault Management Technology, Intel Platform Firmware Resilience Technology, etc., providing enhanced security protection capabilities.

Intel Xeon CPU Max Series

The first Intel Xeon processor to integrate High Bandwidth Memory (HBM). According to reports, it provides a 3.7 times performance improvement for memory-constrained workloads, while achieving a significant reduction in energy consumption.

It is worth noting that although Intel believes that increasing cores alone cannot meet the CPU performance under real loads, several products based on Arm architecture server GPUs have cores as high as 70 cores, exceeding the currently released Xeon Scalable Processor. processor, which also reflects the performance advantages of super multi-core.

In this regard, Chen Baoli, vice president of Intel’s data center and artificial intelligence group and general manager of China, said that the development of data center processors in the direction of multi-core is a major trend. What Arm can do, Intel X86 can also do, but the Arm core itself is relatively small. , so products with more cores can be quickly stacked.

“We pay more attention to how customers use our products. More cores are not necessarily better. Many users today are not blindly obsessed with multi-cores when using data center processors, but analyze specific tasks.” Chen Baoli added .

“In the next generation, it is better to put 10 more cores, it is better to put an accelerator. According to customer feedback, many of them are not 50% performance improvement, but 3 times, 5 times, 6 times or even 13 times performance improvement, and the built-in accelerator ratio increases . Auditing can better meet the growing business needs of customers. ” Zhuang Binghan held the same opinion.

Pilots Intel on Demand, releases first flagship data center GPU

In addition to hardware-level upgrades and built-in accelerators, in order to better meet customer needs, Intel has also launched a new service – Intel on Demand (on-demand service).

The on-demand service, formerly known as Intel Software-Defined Chips, can be used to extend the accelerators and hardware enhancements in most 4th generation Xeon processor SKUs. The features supported by this service include Intel DLB, DSA described above , IAA, QAT and SGK, also includes an API for license ordering, and software agents for configuring licenses and activating CPU features.

It is worth mentioning that Intel said that if the initial customers are not sure whether they need these accelerators, they can first use the 4th generation Intel Xeon Scalable processors, and then choose to enable other acceleration functions without changing the deployment of the data center. , or directly replace the server to enjoy the performance improvement brought by these accelerators.

Why launch the “on-demand service” function? Intel said this is because end customers have told Intel that they want to turn capital expenditures into operating expenditures and better buy computing according to demand and budgetary control.

“Customer needs will change at any time according to the real workload, and the requirements for functions are also different. Under the situation of Intel on Demand, customers can flexibly choose the most suitable service. For example, during the peak season of Spring Festival travel, the 12306 Railway Administration will purchase a lot of Cloud services, after the peak period, return to their own technical facilities to support daily business.” Liang Yali, vice president of Intel’s marketing group and general manager of the Cloud and Industry Solutions Department in China, said in an interview.

In addition to the CPU, Intel also launched its first flagship data center GPU at the press conference, which uses 3D-packaged Chiplet technology and integrates 47 small chips on a single product, integrating more than 100 billion transistors. The Max series of GPUs offers up to 128 Xe cores and ray tracing units, and up to 128 GB of high-bandwidth memory.

It is not difficult to see that Intel’s launch of Max CPU+GPU this time is to compete with Nvidia for the data center GPU market, so where is Intel’s advantage?

Zhuang Binghan said that in fact, many partners hope to have one more choice in GPU, so they are also looking forward to Intel’s GPU products.

“If the server cluster is dedicated to AI training, an accelerator is needed, and the performance requirements of the accelerator will exceed the AMX accelerator embedded in the CPU. At this time, a GPU dedicated to AI processing is needed,” Zhuang Binghan explained the positioning of the Intel Max series GPU. .

While Intel provides GPU products, it also provides a oneAPI design framework. The IP developed by oneAPI in Intel’s production can be reused, so even if TensorFlow and PyTorch run on other manufacturers’ GPUs, they can be seamlessly connected.

“The launch of the fourth-generation Xeon scalable processors and the Max series of CPUs and GPUs is a historic moment in the data center field,” Zhuang Binghan said.

In the down cycle of semiconductors, can Intel stabilize the number one position in the data center?

In 2022, Intel’s data center business will continue to decline. Intel CEO Pat Kissinger will attribute the decline in the company’s performance to factors such as supply and transportation issues under the impact of the global new crown pneumonia epidemic, as well as economic downturns.

Intel remains confident about the future. Liang Yali said in an interview: “The past year was indeed a relatively difficult year. It was the accumulation of three years of epidemics and the superposition of many factors, such as supply problems caused by the epidemic and cyclical problems in the semiconductor industry. Weakness in market demand. In addition to the epidemic, it is the transformation problem facing the entire industry, and the transformation of the industry is driven by technological innovation, so we hope to lay a very good foundation this year.”

Recently, the liberalization of China’s epidemic prevention and control policy has also given Intel a lot of confidence. “We have always had high hopes for China’s economy this year. If China’s economy grows, it will definitely contribute to the global economy, drive consumption, and drive the digital economy. This is closely related to our servers. So we are still optimistic about this year. attitude.” Zhuang Binghan told Leifeng.com.

In the face of the impact from Arm and RISC-V, Liang Yali believes that the development of technology depends on healthy competition. x86 has the most extensive application base and the most extensive ecosystem support. What Intel focuses on is how to make its own Products can better support the innovation, transformation and development of customers’ own businesses. (Leifeng.com)

This article is transferred from: https://www.leiphone.com/category/chips/ygnJxJxyb4j6y8So.html
This site is only for collection, and the copyright belongs to the original author.