First review丨Ali version of ChatGPT is here! We asked over 100 questions and it scored 90 points

–Who are you?

——I am a very large-scale language model that can answer questions, create text, express opinions, and write code. It can be used for various natural language processing tasks, such as language translation, text generation, question answering system, etc.

This is the definition of Alibaba Cloud’s big language model “Tongyi Qianwen” that launched an enterprise invitation test today. Yes, Ali has entered the game.

After Aifaner’s Xiaohongshu debuted “Niao Niao Di Niao”, APPSO got the opportunity to test Tongyi Qianwen in the first place. How is the performance and can it meet the needs of domestic users for large language models? We tested it thoroughly.

Write scripts, talk about love, Tongyi Qianwen is here

A chat box, several function prompts, like other large language models, the interface of Tongyi Qianwen is very simple, as long as you enter a question, you will get a response.

Chatting is just an appetizer, and making Tongyi Qianwen useful in work and life is a serious matter.

Writing articles, making plans, making scripts, how did Tongyi Qianwen perform?

First of all, I want Tongyi Qianwen to write a movie script for me, so that the “outlaw fanatic” Zhang San can be brought to justice.

Reconnaissance and anti-reconnaissance, the use of hacker technology, and even after admitting to the crime and being arrested, Zhang San still turned over his confession in court to try his last chance. Although there are not too many details, Zhang San is really difficult to deal with such a plot that has been repeatedly reversed.

Of course, Tongyi Qianwen can create more than just movie scripts, and the plot of fighting and killing is not suitable for children, so I wrote another story with it, which is entertaining and educational, and let children know the importance of friends.

Two friends who live together, never give up their friendship even in danger, live together actively and happily. Perhaps to make children understand, Tongyi Qianwen also used words like “little hands” and “little feet”.

If you want to create a similar story, you can also name the protagonist in the question like me. For example, the two names of milk tea and Dangdang are the names of my friend’s cat and me. Telling stories with familiar names will make children more familiar. There is a sense of substitution.

I also tried to use Tongyi Qianwen to write a workplace application essay. After thinking for a long time in emails, meeting minutes, customer planning and other file types, I finally decided to use it to write a resignation letter.

When I saw this resignation letter, I laughed for a long time. A hidden rich second generation wrote about inheriting the family business so modestly, but he didn’t lie. This matter is really urgent.

Since I want to inherit the family business, it is natural to plan well, so I decided to set a small goal first: how to make my hotel well-known in the province?

Brands, services, products, marketing, although the content is very comprehensive, but there are not many practical methodologies, so I asked more questions.

Customer research, venue upgrade, service improvement, gift customization, and even cross-industry cooperation have been arranged. Since I didn’t give too much detailed information, Tongyi Qianwen’s answer is already very good.

Tongyi Qianwen also provides a treasure bag, which customizes its capabilities in a more vertical scenario. If you don’t know how to ask questions, then the small application in the treasure bag is more suitable for you.

For example, as a product manager of a large language model, you can use “write outline” to write a project introduction.

After the project is supported by investors, “SWOT analysis” can be used to understand the competitive environment.

When the product is finally developed and launched, you can use “Product Description Generation” to write a product introduction.

Other functions are more entertaining, and fun is their main function. For example, Zhihu often sees “how to write a story starting with XXX”, then you can use the “and then” tool to write the story.

Many large language models have become a joke because of the practice of “fried screws”. Tongyi Qianwen has an attitude that since it cannot be changed for the time being, he will generously show it for everyone to laugh at. “Flying recipes” function.

The existence of the treasure bag made me feel the “humility” of Tongyi Qianwen. It can only answer text, and it is not too early to release foreign models, but it allows users to get started with big language models faster. And the small application that frankly shows the shortcomings (such as recipes) in the treasure bag has become a bright spot of it.

I asked over 100 questions and it was a little bit more than expected

If only the official questions are tested, what is the difference from the manual? We selected 110 questions in various fields from the Chinese test set of some investment institutions for large language models to test Tongyi Qianwen. The questions include:

1. Basic Abilities (50 Questions): Investigate the abilities of fact understanding, information extraction, text translation, etc.

Example: What are American Short, British Short, Siamese and Maine; list 10 science fiction novels; write a poem about traffic lights;

2. Advanced ability (50 questions): Investigate basic abilities such as physics, chemistry, mathematics, and riddles

Example: Which chemical bond does the gold element belong to? Buried in the bottom of my heart, type a word; What are the common morphemes in the following words: pyre, empyrean, antipyretic.

3. Vertical fields (10 questions): Investigate the capabilities of computers, biology, medicine, astronomy, etc.

Example: As a doctor, how many times should you try to intubate a patient before handing over the job to a senior colleague; whose coming is predicted by the Great Cloud Sutra.

Let me talk about the conclusion first. The total score of Tongyi Qianwen is 90 points (43/38/9), which is close to ChatGPT 3.5 (92 points, 47/40/5). Considering the limitations of the question, we cannot draw the conclusion that Tongyi Qianwen’s ability is close to ChatGPT 3.5, but at least in terms of Chinese dialogue, today’s Tongyi Qianwen can bring us a good experience.

For Tongyi Qianwen, what it fails to do well is basically the common problem of large language models.

For example, cooking is a hurdle that the big language model will never get over. From braised screws to deep-fried Altman, the big language model can always paint a strong stroke after stroke for Chinese food.

Fortunately, Tongyi Qianwen’s cooking skills have also improved. When asked about some strange cooking methods, it can already identify the problem and give a relatively normal answer (although it still reads a bit strange).

When other big models say “I can do everything, but I really can’t cook”, Tongyi Qianwen may be the best cook.

However, when it comes to brain teasers, Tongyi Qianwen overturned the car, perhaps because he trusts humans too much, and the big language model is not very good at questioning humans before answering questions. Brain teasers, which are a bit of a bad idea, are still too advanced for pure large language models.

But like “steamed Pikachu,” not all hoaxes get a response. For example, when I let it fabricate the life of the famous Fred Rickerson, it will firmly tell me that this person may not be famous enough.

When I asked “the square root of a banana”, it also clearly told me that bananas are fruits, can’t do math, and didn’t swear.

It can be seen that Tongyi Qianwen, which has just launched the public beta, is already solving various problems that will exist in large language models, but in terms of language logic and mathematical calculations, it still has a long way to go from being easy to use.

▲ The poem is about playing the harp

But I am still full of confidence in Tongyi Qianwen, because in the first test of 110 questions, Tongyi Qianwen scored 65 points (35/23/7), but when I retested the next day, it suddenly I got 90 in the test, does this model evolve in units of days? Curiosity drove me to ask Ali’s friends, and they said they didn’t know anything.

In any case, the development of a large language model is not the same as our efforts from a scumbag to a master.

Remember that Siri you laughed at back then?

I remember the first time I used Siri on the iPhone, my friends and I were chattering, not trying to solve any problems with it, but just hearing “I don’t seem to understand”, and then burst into laughter. Today, everyone’s chattering content has become a brain teaser in the post bar.

For a large language model, it is difficult for it to admit its ignorance of certain knowledge, so it will make jokes such as “spicy screw method” and “the square root of a banana is the root number 3”. This is not deliberately fabricated for a certain purpose, but “made out of nothing” purely due to algorithms. This kind of unintentional mistake of not being able to understand the boundaries of knowledge is a shortcoming that is difficult for neural networks to overcome.

I asked Tongyi Qianwen how to use it effectively, and it told me modestly that its knowledge is trained through a large amount of data and algorithms, but not all of this knowledge is correct. So don’t skimp on expertise and insights if you find an answer wrong, which will help it keep improving and improving.

“Tongyi” represents the breadth and universality of knowledge, and “Thousand Questions” shows that the problem is complex and unique. Tongyi Thousand Questions is not perfect, and we need to give it a better prompt and make progress together with it.

By the way, many paragraphs in this article were completed by Tongyi Qianwen. Can you find out which part it is?

