The previous blog post, the mouth synchronization model Wav2Lip , introduced the mouth synchronization model. This article introduces the high-definition version of wav2lip . On the basis of the original, super-resolution images and face segmentation technology are used to improve the overall effect.


First, pull the source code

 git clone wav2lip -hq.git cd wav2lip -hq # 创建个新的虚拟环境conda create -n wav2liphq python=3.8 conda activate wav2liphq # 安装torch pip3 install torch torchvision torchaudio --extra-index-url # 安装其它依赖库,将其中的torch、torchvision注释掉,前面已经安装了gpu版本pip install -r requirements.txt

Then go to download the model, here you need 3 models, the first download address: , copy it to the directory checkpoints after downloading; The two models are face models. The download address is: . After downloading, copy it to the face_detection/detection/sfd ​​directory and rename it to s3fd.pth ; the third is the segmentation model of the face, download address: , copy it to the checkpoints directory, and rename it to face_segmentation.pth

Finally, we prepare an audio file and a video file for testing, execute the command

 python.exe --checkpoint_path checkpoints\wav2lip_gan.pth --segmentation_path checkpoints\face_segmentation.pth --sr_path checkpoints\esrgan_yunying.pth --face test.mp4 --audio test.mp3 --outfile output.mp4


