The world’s largest AI chip breaks the single-device training record for large models, Cerebras wants to “kill” the GPU

Cerebras, the company known for building the world’s largest accelerator chip, the CS-2 Wafer Scale Engine, announced yesterday that it has taken a major step forward in using “giant cores” for artificial intelligence training. The company has trained the world’s largest NLP (natural language processing) AI model on a single chip. Leifeng Network

The model has 2 billion parameters and is trained on the CS-2 chip. The world’s largest accelerator chip uses a 7nm process and is etched from a single square wafer. It is hundreds of times the size of mainstream chips and has 15KW of power. It integrates 2.6 trillion 7nm transistors and packs 850,000 cores and 40GB of memory. 740

Figure 1 CS-2 Wafer Scale Engine chip

A new record for training AI large models on a single chip

The development of NLP models is an important area in artificial intelligence. Using NLP models, artificial intelligence can “understand” the meaning of words and act accordingly. OpenAI’s DALL.E model is a typical NLP model. This model can convert the textual information input by the user into a picture output.

For example, when the user enters “avocado-shaped armchair”, AI will automatically generate several images corresponding to this sentence.

740

Figure: The picture of the “avocado-shaped armchair” generated by AI after receiving the information

More than that, the model enables AI to understand complex knowledge about species, geometry, historical eras, and more.

However, it is not easy to achieve all this. The traditional development of NLP models has extremely high computing power costs and technical thresholds.

In fact, if we only talk about numbers, the 2 billion parameters of the model developed by Cerebras seem a little unremarkable against its peers.

The aforementioned DALL.E model has 12 billion parameters, while the largest model by far, Gopher, launched by DeepMind late last year, has 280 billion parameters.

But aside from the staggering numbers, the NLP developed by Cerebras has a huge breakthrough: it reduces the difficulty of developing NLP models.

How does “Giant Core” beat the GPU?

According to the traditional process, developing NLP models requires developers to divide the huge NLP model into several functional parts and spread their workload over hundreds or thousands of graphics processing units.

Thousands of graphics processing units represent a huge cost to manufacturers.

The technical difficulties are also painful for manufacturers.

The slicing model is a matter of customization, the specifications of each neural network, each GPU, and the networks that connect (or interconnect) them together are unique and not portable across systems.

Manufacturers must consider all of these factors before the first training session.

The work is extremely complex, sometimes taking months to complete.

Cerebras says this is “one of the most painful aspects” of NLP model training. Very few companies have the necessary resources and expertise to develop NLP. For other companies in the AI ​​industry, NLP training is too expensive, time-consuming and unusable.

But if a single chip can support a model with 2 billion parameters, it means that there is no need to use massive GPUs to disperse the workload of training models. This can save manufacturers the training cost and associated hardware and scaling requirements of thousands of GPUs. It also saves manufacturers the pain of slicing models and distributing their workloads to thousands of GPUs.

Cerebras is not just obsessed with numbers. To evaluate the quality of a model, the number of parameters is not the only criterion.

Compared with the “hard work” of the model born on the “giant core”, Cerebras hopes that the model is “smart”.

The reason why Cerebras can achieve explosive growth in the number of parameters is because of the use of weight flow technology. This technique decouples the computational and memory footprints and allows memory to be scaled up enough to store any number of parameters that increase in AI workloads.

Thanks to this breakthrough, the time to set up a model has been reduced from months to minutes. And developers can switch between models like GPT-J and GPT-Neo “with just a few keystrokes”. This makes NLP development much easier.

This has brought new changes in the field of NLP.

As Dan Olds, Chief Research Officer at Intersect360 Research, commented on Cerebras’ achievements: “Cerebras’ ability to bring large language models to the masses in a cost-effective and accessible manner opens up an exciting new era for artificial intelligence.”

This article is reprinted from: https://www.leiphone.com/category/chips/OIYTRDK489E2klaT.html
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment