Sun Wu playing “Warcraft”? There are pictures and truth

Recently, foreigners with big brains used DALL·E and Imagen to play the “combination of Chinese and Western”.

Following the “Tiger Wearing VR”, Sun Wu, an ancient Chinese military strategist, has become a new target for rehabilitation.

The tester gave DALL·E a question:

Let Sun Wu play the game Warcraft II.

Who would have thought that in 2022 AD, the grandson of the Spring and Autumn Period actually played “Warcraft II” on the computer.

The whole picture is actually quite harmonious and quite oriental .

This should be because DALL·E has entered the qualifier: Oriental Painting (Oriental Painting) , and determined the style of painting.

However, how does this show that he is playing “Warcraft II”? I’m afraid you still need to use your imagination. (manual dog head)

For the same text entered, DALL·E also gave other versions.

This picture also reflects the characteristics of Sun Tzu’s military strategist.

Sitting in front of a computer and playing games may not be exciting enough, let’s try setting Sun Wu as a character in World of Warcraft:

Not only was DALL·E finished, but the images were also 3D rendered. (Although some pictures do not all have the interface of the Warcraft game)

Netizens marveled: This AI actually knows what the characters of World of Warcraft look like!

Let’s talk about how this “AI painter” created it.

How DALL·E works

DALL·E is essentially a GPT-3 retrained with 12 billion parameters.

Its core module is OpenAI’s CLIP .

This is a neural network trained on various images, text, and responsible for scoring and reordering the input images.

In fact, DALL·E generated a lot of pictures during the “creation” process. Among these images, the higher the match with the text, the higher the score given by CLIP.

As for how CLIP analyzes the matching degree of graphics and text? This is inseparable from its ability to “integrate” and understand pictures and text.

And this ability is due to its multimodal neurons, which have a similar working mechanism to the human brain: they can respond to the same meaning in words and images at the same time .

Finally, the top-rated images are presented as output. (This explains why entering the same text can generate multiple images)

Some people suggested that it might be better to combine CLIP and GAN to achieve stronger functions:

Let CLIP calculate the similarity score between the image and the text description, and then feed it back to the GAN, and let the GAN iterate continuously with the goal of improving the score.

The second-generation DALL·E combines the characteristics of CLIP and diffusion models.

Among them, the diffusion model can greatly improve the fidelity of the generated images on the premise of sacrificing diversity.

CLIP text embeddings are first fed to natural regression or extended priors to produce an image embedding.

This embedding is then used to adjust the extended encoder to generate the final image. That is, image generation is done in a “diffusion” process.

Compared with the first generation, DALL·E 2 generates images in a shorter time and with higher image resolution.

In addition, DALL·E 2 also considers changes in shadows, reflections, textures and other factors during the “drawing” process.

For example: put a sofa at the position “1” in the left picture, DALL·E 2 will analyze the light direction according to the existing information, and draw a shadow.

Appreciation of DALL·E’s “Wonderful Flowers”

DALL·E 2 officially displayed many “wonderful” portraits created by programmers and engineers on Instagram.

There is an ancient Roman version of Spider-Man. (The second one is really awesome)

Spider-Man from Ancient Rome

Portrait of Jesus riding a dinosaur. (Jesus would call him a good guy when he saw it)

Jesus Christ wielding a samurai sword and riding on the back of a velociraptor, painting

4K high-definition picture of small animals wearing leather clothes and sunglasses. (These days, looking at animals makes me look good-looking)

A photo of a cool wearing sunglasses and a leather jacket, 4k

A photo of an athletic cat explaining its latest scandal to reporters at a press conference.

Surrealist work.

Remembrance of nostalgia, surrealist painting

And the cover of a cyberpunk romance novel. (Can the content of the novel be directly written by GPT-3?)

The cover of a cyberpunk romance novel

In addition, there are animations that use text differences to retrofit the iPhone.

One More Thing: Change the trademark with DALL·E

In addition to drawing with the DALL·E 2, a researcher, Janelle Shane, used the tool to try and design new logos for major companies.

Let’s see if you like it-

The LOGO designed by DALL·E 2 Pizza Hut looks like this:

The following is the logo designed by DALL·E 2 for Burger King:

There is also a LOGO designed for NASA:

At present, DALL·E is still being tested and trained with a large amount of data, and there are certain ethical challenges and other factors, and it has not been officially opened to the public.

The official said that those who are interested in trying can sign up and wait for an invitation.

But everyone can go to DALL·E official website to choose the given keywords to play.

For example, choose: “An astronaut”, “Lying in a tropical resort in space”, and then decide on the style: “Realistic”.

DALL·E can draw several works according to the proposition.

Not to mention, it really smells like a “space resort”.

If you were asked to put your wild ideas into a picture, how would you like to create it?

