China AI Painting Industry Survey Report – Technology, Users, Controversy and Future

foreword

A number of institutions have recently released AIGC reports. This budding track has attracted huge attention, but some of the smaller and narrower fields have undergone earth-shattering changes earlier. We will focus on one of the tracks, that is, the field of “Generative Vincenzos” in AIGC, and present this report from technology development, user research, business models and disputes, as well as some prospects for the future.

Data and sources for the report include public news media articles, third-party statistics, 6pen’s own data, as well as 2,398 questionnaires we retrieved and in-depth interviews with some industry veterans. Our questionnaires were distributed through 6pen’s own channels, instant, Weibo, and industry KOLs, and finally 2,398 submissions were effectively received. They may be the first direct survey of users in this industry and focused on the domestic market. We believe that It is quite informative.

Stakeholders: This report is released by 6pen survey. As an industry practitioner, this report will be more from the perspective of technology, users and the industry as a whole.

A brief outline of the development of AI painting technology

GAN era

Before diffusion was widely used in AI painting, the main implementation of this art form was GAN (Generative Adversarial Nets). Take NVlabs’ SPADE as an example: the user provides a sketch, and the GAN model converts it into a real landscape image. This sounds a bit like img2img in AI painting, but in this type of model, the user cannot influence the generated results through text.

1

It was not until the emergence of a cross-modal graphic model like CLIP that users could use text to achieve AI painting. Latecomers such as VQGAN+CLIP or StyleCLIP are all because CLIP opens up the text and graphics domains, and the text input by the user is truly “comparable” with the results generated by GAN, which makes the calculation of the error between the two and the iterative effect become possible.

1

GAN has a natural advantage in simulating distribution, so it shines in the fields of face attribute editing and style imitation, such as putting on sunglasses, adding beards, generating anime faces, and imitating novel styles. Because of this, it is also prone to Model Collapse, where the Generator tends to produce the least error-prone results that can fool the Discriminator , such as when the user wants to generate “eyes”. GANs often fail when out-of-domain results such as “face under the nose”, or “wearing glasses made of flowers”.

The rise of Diffusion

Contrary to what most people think, Diffusion was actually proposed earlier than GAN, but because Diffusion relies on large model implementation, the entry threshold for most users is high, and there are not many researchers until 2021, because open source projects Disco Diffusion (also referred to as DD), Open AI’s commercial service Dalle, etc., Diffusion has only begun to be known by more people.

1

In 2022, Disco Diffusion will be greatly improved through the contributions of more developers. It runs through Google colab (many people mistakenly think that DD was released by Google), and it also lowers the threshold for users to try to use it. Around April 2022, the images generated by DD were widely disseminated on social networks, which further led to the breaking of Disco Diffusion.

During the same period, including 6pen, a small number of teams and companies in China began to work on the productization of DD, aiming to further reduce the threshold for the use of DD, including:

  • Provide man-machine friendly interface
  • Provide cloud computing services
  • Finetune the DD model
  • By encapsulating CLIP keywords, etc., the user threshold is lowered and the stylized effect is improved

In the field of research, there are also many institutions or companies that have entered the model layer, such as 6pen’s pumpkin model, Tsinghua University’s cogview, Baidu’s Wenxin, etc. Because of DD’s breaking circle in social networks, these products and models have been improved. Many people pay attention and use it, but there are still many professional Chinese users who use foreign commercial products such as midjourney and Dalle2 in various ways.

Stable Diffusion Open Source Pandora’s Box

Before Stable Diffusion, the best AI painting implementation in the open source solution is undoubtedly Disco Diffusion, but Disco Diffusion also has many problems, such as:

  • Slow generation (based on pixel iteration), the resulting problem is high generation cost (high graphics card cost)
  • The logic of generating pictures is poor, and the picture structure is often disordered
  • Almost impossible to generate people and objects

Making up for the shortcomings of Disco Diffusion is the direction of many models or research teams. The latent diffusion proposed by CompVis migrates the diffusion process from the image level to the latent level, which reduces the inference time by an order of magnitude (~10min->30s). Latent diffusion embeds a small text model in the model, so that the generation process does not have to rely on large open source language models such as CLIP, but this also makes the model’s ability to understand the text input by the user weaker, and the resulting text is not correct.

1

The pumpkin provided by 6pen is based on a certain improvement on it: the CLIP model is replaced by the latent diffusion’s own small text encoding model, and the CLIP error between the model result and the text is calculated to optimize the generation process, so the model can understand the user’s text. , the generation quality has been significantly improved. The later Stable Diffusion also optimized the model structure and data based on the idea of ​​latent diffusion + CLIP, and carried out large-scale training, which achieved amazing results.

Published by Stability in late August 2022, Stable diffusion has powerful features:

  • Extremely fast (based on latent space iteration)
  • Far beyond the screen logic of DD
  • Can handle people and objects better
  • More stylization, such as quadratic style
  • Easier training framework

Stable Diffusion has greatly lowered the entry barriers of the industry, including the technical threshold and the cost of stock graphics cards, bringing more competitors into the market, but at the same time, some innovative applications based on Stable Diffusion have begun to appear, such as Vincent Video, Unlimited Pictures Expanded, combined with 3D modeling tools, etc., there is no doubt that Stable Diffusion is revolutionary in this era.

We are currently at this point in time.

User portrait of AI painting

User base portrait

According to our recycling report, the characteristics of domestic AI painting users are younger, more than 46% of them are college students and graduate students, and even 18% of them are junior high school and high school students.

1

In terms of urban distribution, the vast majority of users are still located in first- and second-tier cities, among which Beijing accounts for 8.7% and Shenzhen accounts for 7.8%, but Qingdao unexpectedly ranks fourth, accounting for 6.1%. Southern cities account for the vast majority, while northern cities are few.

1

In terms of the industry distribution of audience users, the survey results are quite different from our expectations. Art and design workers only account for 24.2% (ranking second), and the first industry is the offline industry (26%), ranking third. of the Internet industry (24%)

1

In terms of the specific ways of using AI painting, 38% of users only use online services, and 16% of users use their own graphics cards. Even so, 21% of users said that although they currently use online services, they hope to use online services in the future. Only 4% of users who use their own graphics cards now use graphics cards and say they will use online services in the future

1

User payment and revenue survey

According to our survey, 60% of users have not paid for the use of AI drawing products, that is, they are completely free to use them. Among the remaining 40% of paying users, 16% paid less than 10 yuan, and 14% paid less than 100 yuan. Less than RMB 10% of the payment over RMB 100

1

At the same time, we also investigated the situation of users generating income through AI painting. The results show that most users (92%) did not generate income through AI painting, that is, “pure entertainment use”, and 4% of users received 100 yuan. For the income below, 1.9% of users received income below 1,000 yuan, and users who received more than 1,000 yuan in income accounted for 2%

1

User perception

The questionnaire survey shows that 42% of users think that AI painting can only meet the needs of entertainment, 38% of users think that AI painting can be partially used in work, 9.17% of users think that AI painting is subversive, at the same time, there are 7 % of users thought it was not worth mentioning at all.

1

The vast majority of respondents only learned about the field of AI painting in the last month (August-September). 27% of them started contacting in the first half of this year, and only 23% learned about it last year.

1

Our questionnaire includes the use of the model. In order to ensure objectivity, we exclude 6pen (users from 6pen channels will affect the accuracy of the data, we will show the proportion of different models used by 6pen separately), and count the rest of the models used by users. At present, we can see that Disco Diffusion has a slight advantage. The second is Stable Diffusion, and the third is midjourney. To our surprise, the old AI drawing product wombo dream still has a wider audience, even more than the famous Dalle. Series is used by more people.

1

At the same time, the data of nearly 10 million users using the model in 6pen shows that Stable Diffusion is used the most, accounting for 77%, Disco Diffusion model accounts for 10%, and pumpkin accounts for 13%

1

User usage scenarios

The vast majority of users said that the use of AI to generate paintings is only for their own appreciation. At the same time, 56% said that they would share them with friends. 6% of users said they would use them at work after processing them. Less than 2% of users said The generated images will be used directly for commercial purposes, and 23% of users said they will post them on social media to increase fan attention.

1

Business Models for Platforms and Tools

bill the user directly

At present, the vast majority of commercialized AI painting services almost earn revenue by generating fees, as follows:

  • Stable Diffusion
    • Model open source free
    • Dream Studio and API: EUR 0.01 / base call
  • midjourney
    • $10/month: 200 quick spawns + unlimited queued spawns
    • $30/month: 900 quick spawns + unlimited queued spawns
    • $4/GPU hour
    • $600/year Enterprise Package
  • Dalle
    • $0.13 / Generate
  • 6pen
    • Free queue generation
    • Paid Quick Generation: From 0.1 RMB

It can be seen that commercial AI painting services currently hardly distinguish ToB or ToC, and more of them provide pay-per-use or pay-as-you-go services, which can be used by both enterprises and individual users. This charging model is due to these reasons:

  • AI generation uses graphics card servers, and maintaining free use requires huge costs
  • Lack of closed loop after generating graph, no other revenue from free users
  • Limited by copyright and other ethical factors that are still in dispute, other commercialization methods have yet to be explored

Possibility of ToB

AI painting naturally has more possibilities in the ToB field, but limited by the quality of the model, copyright disputes, and the current earlier technical stage, there are few cases of public implementation, but we believe that in the following directions, it may be possible in the future More ToB success stories are emerging:

  • Advertising industry
  • ToB material library
  • Designer/Artist Aids
  • Marketing customization service
  • Offline entity integration service
  • Metaverse isometric virtual space

Controversy, Issues, Potential and Future

dispute

At present, the biggest controversy in AI painting lies in the attribution of copyright and whether the model has the right to specify the attribution of copyright. As we all know, a large number of materials used in AI model training may contain unauthorized image data with clear copyright owners, so the source of the model is With the “unauthorized” stigma, the supporters believe that the AI ​​model has been trained, iterated, and distilled, and the final result is a simple, brand-new computing method. The pictures produced by this computing method are copyrighted. Attribution should be specified by the model.

Even so, the most widely recognized statement is that if a living artist is specified in the text description (prompt) of an AI-generated image, the copyright of the image should never be declared.

We advocate that if a living artist is specified in the generation of the prompt, then at least the images generated by AI should be released under the CC0 agreement, and the relevant information of the artist should be retained, and the artist should try not to use it for commercial purposes before obtaining the authorization of the artist. Even so, It may still cause trouble for artists, which is still being discussed all over the world, and better cooperation rules between AI and human artists also need to be established as soon as possible.

Exploration of Copyright Issues

6pen distributed questionnaires to original artists and collected 368 artist feedbacks, of which 7.1% of the interviewed artists clearly stated that they had been learned by the AI ​​model, and 67% of the artists were not sure about it.

1

27% of the artists do not want the AI ​​model to use their own style anyway, 27% of the artists hope that if the generator uses their own style, they need to show the artist’s information when using the picture, 37% of the artists want to use their own style, then you need to pay yourself, only 6.9% don’t mind at all

1

For the current AI painting (including models and products), the vast majority of artists have a negative attitude, and the NPS score is as low as -89. It does not benefit from model generation, but the generated pictures may bring benefits to the generator, which is not fair, but also destroys the production relationship to a certain extent, and will further disintegrate the creative power of human beings—— In order not to become fodder for model learning, there will be less original exploration, and new styles, paradigms, and genres may cease to emerge.

However, if AI painting can be more standardized in the future, for example:

  • Train the model with copyright-clean assets
  • Use artist styles and pay them a share
  • Explore new stylistic boundaries with artists
  • Provide auxiliary functions for human creation
  • Respect the artist’s willingness not to be learned by the AI ​​model

According to our survey, if these issues are addressed, artists’ NPS ratings for AI-generated image technology will increase by 4 times, and the vast majority of artists consider such AI-generated images acceptable.

1

However, it is not easy to establish such a model, which requires a lot of innovation and experimentation in technology, products and rules. 6pen will start taking action after collecting more feedback, and we will also share the process of exploration at any time.

other problems

technical problem

Although AI painting technology has developed rapidly in the past few months, there are still some problems, including:

  • For graphics cards, mainly video memory, the requirements are high and the cost is high
  • Inability to precisely specify the number of screen objects, e.g. “three pigs and four tigers”
  • Human limbs (mainly fingers) and eyeballs are less effective
  • Multi-agent objects generate poorly (often only one or two agents remain)
  • Story generation with logical continuation is not possible

However, these issues are expected to improve considerably in the next 6-12 months

technical ethical issues

Unlike other tracks, AI painting relies not only on advanced AI technology, but also (many times) on the capabilities of the open source world, so there may be some possible ethical issues that do not exist in other industries. Many packaged products have emerged in China, many of which have technical ethical issues, including:

  • Does not comply with the open source model license, packaged as self-developed or domestic AI for marketing*
  • Directly use generated images that may contain artist styles for copyright transactions or NFTs
  • Display unsafely filtered images directly to users, which may contain discriminatory, violent, pornographic or other content
  • Excessive encapsulation, hiding information about the artist
  • Use non-open source services, use crawler and other methods to obtain the generated results of other services, and package them into independent products

* According to our statistics, 95% of AI painting products emerging in China after September 2022 will use Stable Diffusion, but less than 10% will display necessary information according to the Stability License

These problems may hinder the healthy development of the industry, and even exacerbate conflicts between user groups with different positions. Some of them are unavoidable, while others are completely intentional. We also appeal again that practitioners should be rational and dignified. Explore and innovate.

future market forecast

AI painting is still a fast-growing market, and technology is developing rapidly. According to our forecast, in the next five years, 10-30% of the world’s picture content may be generated by AI or AI-assisted generation. Based on this, we estimate its market. The scale may exceed 60 billion.

1

However, if such a figure is to be achieved, the aforementioned problems and disputes need to be properly resolved. In general, the technological development of AI painting is indeed very fast, even so fast that the laws and regulations are far from keeping up, and it has already occurred. This is not necessarily a good thing, but it also contains many new possibilities. We should be more cautious, explore carefully, and protect as much as possible instead of harming the interests of more people.

If the development of AI technology is at the expense of the exhaustion of human creativity, then AIGC will be the greatest misfortune of human beings, but if we can find a good way to stimulate the infinite creativity of human beings and benefit everyone, then , which will be the key to a better world, we are in the middle of these two paths, we can’t stop moving forward, and we will also be watching and seeing where humanity is going.

We hope that we can go in the right direction.

This article is reproduced from: https://sspai.com/post/76141
This site is for inclusion only, and the copyright belongs to the original author.