Llama 2 Launch: Overnight, Big Model Competition Comes to a New Beginning

Original link: https://www.latepost.com/news/dj_detail?id=1763

When a company’s new technology is far ahead and it is about to monopolize an industry, what should the pursuers do?

In 2008, a year after the release of the iPhone, major mobile phone manufacturers struggled to develop operating systems to catch up with Apple. Microsoft has Windows Mobile, Blackberry has BBOS, Nokia has developed Maemo based on the Linux system, and Palm is secretly developing WebOS…

Less than five years later, the only smartphones that can still be sold are either from Apple or the open source Android system. Today, Apple’s competitors no longer have their own operating systems, but they control more than 80% of the smartphone market.

An entire industry revolving around open source technologies, consolidating against the leaders, is a constant occurrence in today’s technology competition.

The Windows system is difficult to challenge, and the technology industry, dissatisfied with Microsoft, has turned Linux into the operating system of websites and Internet applications. Amazon AWS has pioneered the cloud computing industry, and competing companies such as Alibaba Cloud and IBM regard Google’s Kubernetes (K8S) open source technology as the standard. With almost all mobile processors relying on the ARM architecture, RISC-V is receiving broad investment support.

Meta contributed another such example last night. They announced that the large language model Llama 2 will be conditionally open sourced for commercial use (more than 700 million monthly active users need to apply separately), and they are leading the open source standard in the era of large models. And OpenAI’s close partner Microsoft, this time became Llama 2’s primary partner.

Microsoft announced the cooperation at the Inspire conference held on the same day, and just 2 minutes before the announcement, Microsoft also drew “Microsoft OpenAI” on a PPT. Microsoft is holding hands with the competitive closed-source OpenAI and the open-source Llama 2, which reflects the ever-changing commercial cooperation in addition to the fierce technical competition of large-scale models.

e5369d1afa65b39fbd998432912f4630.jpeg

Microsoft CEO Satya Nadella (Satya Nadella) emphasized at the press conference that Microsoft has a close relationship with OpenAI (Part 1). Meta CEO Mark Zuckerberg (Mark Zuckerberg) and Nadella’s photo (below), the picture comes from Zuckerberg’s social media.

After ChatGPT was unveiled at the end of last year, large and small technology companies and various research institutions around the world are struggling to catch up, and hundreds of large models have been created. After Meta open-sourced Llama 2, most of these models became obsolete before they were commercially available.

“Llama 2 looks very powerful (beyond GPT-3), and the fine-tuned chat model looks to be on the same level as ChatGPT.” HuggingFace machine learning scientist Nathan Lambert (Nathan Lambert) said, “It is a huge leap for open source, but a huge blow for closed source big model companies. This model (Llama 2) will meet the needs of most companies for lower cost and personalization.”

Level between GPT-3 and GPT-3.5

In February of this year, three months after ChatGPT was released, Meta open sourced the first version of the Llama large language model. At that time, all developers could get was the Llama pre-trained model, which was only allowed to be used for research, not an application trained for specific tasks or requirements like ChatGPT.

The commercially available Llama 2 looks stronger. This Meta has released a total of three parameter scale models of 7 billion, 13 billion and 70 billion. It has announced a large number of details such as model training data, training methods, and data annotations, showing the level of Llama 2:

  • With the same parameter scale, the capability of Llama 2 exceeds all open source large models;
  • The 70 billion parameter model is close to GPT-3.5 behind ChatGPT at the inference level, but there is still a big gap in the ability to write code.

Several developers who tested the Llama 2 model basically confirmed Meta’s statement: “the code test session won’t last 15 minutes”. The 7 billion parameter model can run on a Mac at 6 characters per second, which is 70% slower than Google’s PaLM 2 smallest model “Gecko”. But Google did not announce the specific parameters of “Gecko”.

According to the information released by Meta, the training data of Llama 2 (all from public data) has increased to 2 trillion Tokens (referring to a commonly used word, punctuation or number), which is 40% more than the first generation. Its context length is extended to 4000 characters, and the understanding of text semantics is stronger.

Meta also, like OpenAI, uses the Human Feedback Reinforcement Learning (RLHF) mechanism to train a ChatGPT-like dialogue application with 1 million human-labeled data. This is also the common way the open source community has fine-tuned training Llama over the past few months. Meta claims that “the superior writing ability of large language models is fundamentally driven by RLHF.”

Training Llama 2 may not be cheap. HuggingFace machine-learning scientist Nathan Lambert estimates that Llama 2 could cost more than $25 million to train, no less than what OpenAI spent training GPT-3 three years ago. There are ample indications that Meta is continuing to train a stronger Llama, he said.

e93a8cb60897c7ba3c829321eb89be26.png

Meta’s Llama 2 outperforms other open source models on multiple datasets. Image via Meta.

“Change the market structure of big language models”

As infrastructure, the big model sits at the bottom of the product. When users use the large model application, what they feel is the content processed by the dialog box and the large model, and they will not see what large model or technology is used.

This feature of the large model determines its competitive situation to a certain extent – as long as there is a large model that better meets the needs of users or enterprises, the barriers to replacement are not high, and it will not even cause too much negative impact on users. “If there is not much difference in the capabilities of the large models, you only need to do some scheduling work to solve it, and the amount of development is not large.” An AI developer said.

With an open source large model like Llama 2, self-development is even less meaningful. Even Andrej Karpathy, a research scientist at competitor OpenAI and former director of artificial intelligence at Tesla, said that the release of Llama 2 is an important day in the development of artificial intelligence and large models. “Llama 2 is the most powerful language model that anyone can get model weights (parameter features, the most critical information of a model).”

Yann LeCun, vice president of Meta and head of the artificial intelligence department, said that Llama 2 will change the landscape of the large language model market. An executive of a large-scale model start-up company in China explained this sentence: “You will soon see many companies developing large-scale model applications, replacing the basic model with Llama 2”.

Many artificial intelligence researchers agree with Yang Likun’s statement that with the release of Llama 2, Meta can use open source and commercial support strategies to change the pattern and ecology of large models.

In June this year, Sequoia Capital found that among the 33 start-up companies and listed companies it invested in, 65% had launched large-scale model applications, and 94% were using OpenAI’s large-scale model interface (API) to develop applications.

Most of their methods of using large models are relatively simple: directly call the interface of ChatGPT to process private data to complete specific tasks, such as multi-language translation, text generation or web page content summary, etc. Few companies will do more in-depth development, such as fine-tuning models with large amounts of data.

In China, many companies choose to collect data from scratch or use public data sets to train large models. In the past six months, more than 80 large models have been released.

“LatePost” learned that an open-source large model with 6 billion parameters launched by a large-scale model start-up company in China that has attracted much attention in China costs millions of yuan to buy a commercial license.

A person in charge of the artificial intelligence department of a listed company told LatePost in May that they planned to use OpenAI’s GPT-3.5 to develop functions, but the cost was too high—the daily cost was estimated to be tens of thousands of yuan, and it was difficult to customize and develop, and it did not support responding to a large number of user requests at the same time.

In the end, they chose Llama (6 billion) with a smaller number of parameters and a large open source model from a Chinese company, which means that training and deployment costs are lower, and after data fine-tuning, in their business scenarios, the development effect based on Llama and the Chinese open source model is not much different from using GPT-3.5.

Another advantage of China’s large model company at that time was that it could negotiate commercial licensing, but Llama could not. When Llama 2 started to be commercially available, this advantage of China’s large model companies is now gone.

Open source megamodels are catching up fast

When ChatGPT was first released at the end of last year, it shocked the world with its seemingly meaningful replies and powerful coding ability. Many companies are concerned about how to make a similar product.

More than half a year later, from large companies to ordinary programmers, they can use the open source community to create an application similar to ChatGPT. Replit, a cloud-based development platform, has found that the number of projects using the open source big models they serve is doubling every quarter.

On the basis of open source large models such as Llama, developers have created various open source datasets, such as the dataset based on human feedback reinforcement learning (RLHF), to continuously improve the capabilities of open source large models.

According to the LMSYS Org evaluation established by many professors and students from UC Berkeley, Carnegie Mellon University and other universities, the gap between the open source large model and GPT-4 has been significantly narrowing in the past few months-from a difference of 191 points to a recent 115 points. In the process of catching up, the open source community also made large-scale models that run on computers and mobile phones one step ahead of big companies, more than a month earlier than Google.

With Meta open sourcing Llama 2, the power of the large model open source community will only grow stronger. Meta said that after the first version of the model that was not supported for commercial use was open sourced, they received more than 100,000 applications from researchers to use it-this does not count those who downloaded the model directly from the Internet.

“Artificial intelligence researchers in large companies were cautious about the first version of Llama because of open source licensing issues, and now I think many of them will jump on this ship (Llama 2) and contribute their firepower.” Jim Fan, a senior artificial intelligence scientist at Nvidia, said that even if Llama 2 is not capable of programming now, it will catch up soon after open source.

This time, the largest open source parameter version of Llama 2 (70 billion) has less than half of the parameters of GPT-3 trained by OpenAI three years ago, but the effect is better than GPT-3, which is one of the best examples.

The logic of open source tends to expand the coverage of new technologies after the large model reaches a certain level, so that more people can use the technology, and then improve the model from a large number of applications. Closed-source companies, such as OpenAI, are more inclined to lead in technology, develop powerful models and then promote them to more people.

Just like the competition between iOS and Andriod on the mobile operating system, the competition between open source and closed source is not all in the same dimension. Similar differentiation will occur in the field of large models.

In this new competitive landscape, even Google is not confident to continue to stay ahead.

In May of this year, a senior Google engineer wrote internally that although Google still has a slight advantage in the quality of the large model, the gap between open source products and Google’s large model is narrowing at an alarming rate. The open source model iterates faster, and users can customize it according to different business scenarios.

“In just a few weeks, they can use a $10 million and 13 billion parameter model to do something that is difficult for us to spend $10 million and a 54 billion parameter model.” He said, “We don’t have a moat, and neither does OpenAI.”

After announcing the open-sourcing of Llama2 yesterday, Meta explained that open-sourcing is the right thing to do for the development of AI models today, especially in the rapidly advancing world of production, where “by making AI models publicly available, they can benefit everyone—not just a few big companies.”

A new kind of competition unlike the past is taking place in generative artificial intelligence. With the power of open collaboration, the open source community is catching up with the lead established by the commercial giants at an astonishing speed. And large companies that used to be used to closed technology and market monopoly are gradually embracing open source.

This article is transferred from: https://www.latepost.com/news/dj_detail?id=1763
This site is only for collection, and the copyright belongs to the original author.