Stable Diffusion Advanced Tutorial – AIGC Development History and Product Comparison

Original link: https://www.dongwm.com/post/stable-diffusion-history/

foreword

The content related to AIGC has been very hot recently, so I will also take a look at it. This is the first article on this topic, first introduce AIGC and related authoring tools.

AIGC

The full text of AIGC is called “AI Gererative Content”, which is a new content creation method after professional production content (PGC, Professional-generated Content) and user-generated content (UGC, User-generated Content).

One of the milestones for AIGC to enter the public eye was early September last year, the following digital oil painting “Space Opera House” generated using MidJourney:

After winning first place in the Colorado State Fair art competition, the judges of the competition did not change the judgment of the work, and believed that even though it was an AI-generated work, it still deserved such a result After the news was reported, it triggered extensive discussions inside and outside the circle.

Afterwards, an UP master at station B used Midjourney to create the work ” Kill That Shijiazhuang Man ” by Universal Youth Hotel based on pictures generated from the meaning of the lyrics. Works with the theme of “painting by AI”, such as “Young And Beautiful”, “The Lonely Brave”, “Qilixiang”, etc. I also started to pay attention to this field at this stage. Of course, my understanding at that time was still in the small direction of AI painting (txt2img, that is, input text, and the computer will translate it into an image). In fact, looking at it now, the generated content The field is very broad.

Through the creation method of AIGC, non-professional users like me who have no drawing foundation can also create very satisfactory works.

AIGC’s main authoring tool

Then I list some creative tools that I think are very important according to the timeline.

DALL-E

In January 2021, OpenAI launched the DALL-E model, which understands natural language input and generates corresponding pictures through the 12 billion parameter version of the GPT-3 Transformer model. But it was launched primarily for research, so access was limited to a small number of beta users. This model is unstable and the understanding of details is not perfect, and there may be serious logical or factual errors, but as a pioneer, it still has to be specifically raised.

CLIP (Contrastive Language-Image Pre-training, Contrastive Language-Image Pre-training) was also released when DALL-E was released. CLIP is a neural network that returns the best caption for an input image. It does the opposite of what DALL-E does — it converts images to text, and DALL-E converts text to images. The purpose of introducing CLIP is to learn the connection between visual and textual representations of objects.

Disco Diffusion

Disco Diffusion is a diffusion+CLIP-based deep learning model that was open-sourced in October 21, and can generate images by inputting text. This tool usually runs on the Google Colab platform without local configuration, so there is no requirement for computer configuration, just run it in the browser.

The following is the rendering released by artist and designer Somnai when the project was open-sourced:

In actual use, it has a good effect on the content of scenery, subject and painting style, but the effect on characters is relatively poor. After Somnai joined MidJourney, the project stopped updating.

DALL-E 2

In April 2022, OpenAI released a new version of DALL-E 2 , which is an upgraded version of DALL-E. In addition, it can re-edit the generated images. Now even new users need to recharge to generate new images. I don’t have any experience, I just understand it through the dynamic side of the official Ins account, but now I can still experience it through Bing: https://www.bing.com/create/

I feel that the paintings it generates are relatively single and simple compared to the two mentioned below.

Mid Journey

MidJourney’s v1 was released in February 2022, and it became popular because of the v3 version in July 22.

It is characterized by relatively comprehensive comprehensive capabilities and strong artistry. It is very similar to the works produced by artists. In addition, the image generation speed is faster. In the early days, many artists mainly used Midjourney as their creative inspiration. In addition, because Midjourney is carried on the Discord channel, there is a very good community discussion environment and user base.

The second fire is actually the release of V5 in March this year. The official said that this version has significantly improved the realism of the generated images, finger details, etc., and has improved the accuracy of prompt word understanding, aesthetic diversity and language understanding. have also made progress.

Now new users can no longer generate images for free, a subscription is required. No demonstration, I have two experiences:

  1. If you don’t know how to enter correct and valuable prompt words, you can generate a prompt from a URL like Extended Reading Link 5, and there are many similar websites
  2. If you want to become a MidJourney master, you need to learn a lot of skills. You can search for various related articles and videos online, such as extended reading links 9 and 10 (of course, you also need to read the official documents)

Stable Diffusion

In August 2022, Stable Diffusion will be open sourced,

The Stable Diffusion algorithm is based on the latent diffusion model (LDM / Latent Diffusion Model) proposed in December 2021 and the diffusion model (DM / Diffusion Model, which is based on Google’s Transformer model) proposed in 2015, so there is Diffusion in the name, I Guessing Stable means that the algorithm is now stable.

It is necessary to talk about the confusing point of this project first. It is open source. If you have researched it yourself, you can find three projects with the same name from Github:

  1. https://github.com/CompVis/stable-diffusion
  2. https://github.com/runwayml/stable-diffusion
  3. https://github.com/Stability-AI/stablediffusion

First, CompVis, the machine vision learning group at the University of Munich, wrote this paper. AI video editing technology startup Runway provided expertise to help realize the first version. is Version 2). So now just focus on the third item.

SD will separate the imaging process into a “diffusion” process at runtime – starting with noise, scoring the correlation between image and text according to CLIP, and gradually improving the image until it is completely noise-free, so that Step through the text description provided. The specific principle can be found in the extended reading link 8.

SD can generate pictures with high definition, good restoration and wide selection of styles in just a few seconds. Its biggest breakthrough is that anyone can download and use its open source code for free, without the need for MidJourney and DALL-E That way you pay for it as a cloud service.

Stable Diffusion XL

At present, there are two shortcomings of SD that are most distressing to users:

  1. Need to enter very long prompts (prompts)
  2. The processing of human body structure is flawed, and there are often abnormal movements and human body structure

In April 2023, Stability AI released the Beta version of Stable Diffusion XL , and mentioned that it will be open source after the parameters are stable after the training, and improve the above two shortcomings.

Comparison of MidJourney and Stable Diffusion

The first thing to explain is that AI drawing is highly random and stylized. Even if you have a more accurate prompt word, perhaps changing the seed can reverse the result, but it is not easy to directly compare. I’m just doing side-by-side comparisons here:

  1. price. After all, MidJourney is for profit, and it is far less expensive than deploying to your own server. SD wins
  2. friendliness. MidJourney is novice-friendly and can be used immediately after registration. Compared with SD, a certain technical background is required. It can even be said that designers or art creators do not have the ability to deploy. SD Xiaosheng
  3. Function. In addition to supporting all functions of MidJourney, SD also supports filling restoration and custom models. SD Xiaosheng
  4. Control over details. Similar to the difference between Apple (MidJourney) and Android (SD), MidJourney is a commercial product, and you cannot understand the principles and code logic behind it, so the controllability is poor, and the details are difficult to optimize (even the more you tune it, the worse it is), while SD is due to Is open source with a strong community and related…

Original: Stable Diffusion Advanced Tutorial – AIGC Development History and Product Comparison

This article is transferred from: https://www.dongwm.com/post/stable-diffusion-history/
This site is only for collection, and the copyright belongs to the original author.