Google shows the AI ​​”ace” to generate super realistic pictures, netizens: Is OpenAI DALL-E going to be crushed?

There is a new wave of trends in AI today – text-to-image generators. Just feed a textual description into these programs, and they generate accurate images that closely match the description. These programs also support a variety of styles, from oil paintings to CGI renderings to live-action photos. In short, only you can’t think of it, and you can’t draw it without it.

Previously, the leader in the field has been DALL-E , a program developed by commercial AI lab OpenAI (which just completed an update in April of this year). But just yesterday, Google also took out its own trump card: Imagen, and overwhelmed the DALL-E in output quality.

The following images are all generated by Imagen:

A photo of a raccoon wearing an astronaut helmet looking out the window at night

The brain that traveled to the moon in a rocket ship

A dog looks curiously in the mirror and sees a cat

A robot couple enjoying a meal against the backdrop of the Eiffel Tower

Small cactus with straw hat and neon sunglasses in the Sahara desert

To understand what these models are capable of, of course, start with the work they generate. (Interested friends can visit the Imagen landing page to see more examples).

As you can see, the text below the image is the prompt input to the program, and the image is the output result. It’s as simple as that, tell the program what you want to see, and Imagen can create it yourself. This is awesome!

While the continuity and accuracy of the photos are impressive, we onlookers at least keep our heads up. After all, when research groups like Google Brain announce new AI models, they tend to pick the highest quality results. So while the picture looks perfect, I’m afraid it doesn’t represent the average output level of this graphics system.

Remember: Google will only show you the best pictures

In the past, the pictures generated by the text-to-image model did not look very complete, and the pictures were quite blurry. The images generated by OpenAI ‘s DALL-E have these problems.

But Google expressed dissatisfaction, claiming that the images generated by Imagen have surpassed DALL-E 2 across the board, because Imagen has a set of new benchmarks for testing, DrawBench.

DrawBench’s benchmark isn’t particularly complex: it’s essentially a list of about 200 text prompts that the Google team feeds into Imagen and other text-to-image generators, and human graders judge each program output quality. As you can see in the graph below, Google found that humans tended to prefer the output of Imagen and rated the work of other competing models relatively low.

 The Google DrawBench benchmark compares Imagen’s output to text-to-image competitors such as OpenAI’s DALL-E 2.

But that’s all Google said, and they haven’t fully opened up the Imagen model, so we don’t know if it’s true or not. There are reasons for not opening up. After all, although the text-to-image model has great creative potential, it may also lead to serious consequences under malicious use. Imagine if such a system could generate any image we want, how about fake news, hoaxes, or harassing material? Google also emphasized that these systems are inevitably exposed to social bias during the training process , so the output results will also contain racism, sexism or some other toxic content .

As the old saying goes: Garbage in, garbage out, and AI is no exception

This largely stems from the way such systems are programmed. Essentially, they train on large amounts of data (in the case of Imagen, a combination of images and text), find patterns in the data and try to reproduce them. But in order to draw solid conclusions, models have to work with huge amounts of data. Even a tech giant as well-funded as Google struggles to filter all input within a research team. Therefore, they can only grab content directly from the Internet, so the toxic speech and information on the Internet will inevitably penetrate into the AI ​​model.

The Google researchers also concluded in their paper: “Text-to-image models often require large-scale data to support … so researchers are highly reliant on large, often uncombed, data sets scraped directly from the web. … The data set audit results show that such data content often reflects harmful associations such as social stereotypes, oppressive views, and derogation of marginalized identities.”

In other words, the old saying of computer scientists is good: garbage in, garbage out, and AI is no exception.

Google didn’t explain exactly what disturbing content Imagen was generating, but emphasized that the model “encodes several social biases and stereotypes, such as an overall preference for lighter-skinned people, and a greater preference for images that match the Occupation and gender combinations of Western world stereotypes.”

The same problem occurs with DALL-E. For example, when DALL-E was asked to generate the “flight attendant” image, it was almost exclusively female. And if you ask to generate a “CEO” picture, you will basically get an “old white man”.

Faced with this problem, OpenAI also decided not to publicly release DALL-E, and only open beta testing to a select group of people. They also filtered certain text inputs, hoping to stop the model from generating racist, violent or pornographic images. These moves do limit the potentially harmful applications of the technology to a certain extent, but the history of AI technology tells us that sooner or later this text-to-image model will be made public, and then all disturbing effects will be flooded. leak out.

Google itself concluded that Imagen was “not yet suitable for public use” and said it planned to develop a new method to measure “social and cultural biases in the future of work” in the hope of testing subsequent iterations of the model. But for now, Google’s images are really of high quality – the raccoon in the crown and the corgi in the sunglasses are quite impressive. But this is just the tip of the iceberg. If Imagen wants to show the whole truth, Imagen has to deal with the unintended consequences of technical research.

Original link:

https://www.theverge.com/2022/5/24/23139297/google-imagen-text-to-image-ai-system-examples-paper?ref=refind

The text and pictures in this article are from InfoQ

loading.gif

This article is reprinted from https://www.techug.com/post/google-showed-its-ai-trump-card-and-generated-super-realistic-pictures-netizen-is-openai-dall-e-going-to- be-crushed/
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment