Using Convolutional Neural Networks to Train a Model to Automatically Identify Captcha – Practice at YYeTs

BennyThink has made a very cool website called ” Renren Video Sharing Station “, but I found that I need a verification code when I log in. If I forget my password, how can I log in 100 times in an instant to try my various password combinations? Woolen cloth?

The verification code is saved as a 160px * 60px image. In order to understand how this verification code is generated, we can directly refer to the code of this website at https://github.com/tgbot-collection/YYeTsBot/blob/master/ There is the following code under yyetsweb/database.py :

 from captcha.image import ImageCaptcha captcha_ex = 60 * 10 predefined_str = re . sub ( r "[1l0oOI]" , "" , string . ascii_letters + string . digits ) class CaptchaResource : redis = Redis () def get_captcha ( self , captcha_id ): chars = "" . join ([ random . choice ( predefined_str ) for _ in range ( 4 )]) image = ImageCaptcha () data = image . generate ( chars ) self . redis . r . set ( captcha_id , chars , ex = captcha_ex ) return f "data:image/png;base64,{base64.b64encode(data.getvalue()).decode('ascii')}"

Among them, predefined_str should be the character set selected by predefined_str (removing easily confused characters such as 0oO). The content of the character set is as follows, with a total of 56 characters:

 abcdefghijkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ23456789

When needed, this function generates a captcha image (base64 version) and ID and sends it to the web page, and puts captcha_id and chars (actual captcha characters) in Redis, and takes the verification directly from Redis when logging in The code, using Redis to automatically expire, is very subtle. (I have to toss https://github.com/mewebstudio/captcha when I write PHP myself, and then use composer for a long time. I have to override the function to change the API Endpoint. I don’t know where the high end is.)

Train Network

After knowing how the verification code is generated, in order to realize the dream of logging in 100 times in an instant, we have to start thinking about how to automatically identify the verification code. Generally speaking, there are the following ideas:

Let BennyThink turn off the verification code for my IP, obviously not
Looking for an automatic code receiving platform, it costs money, and the speed is not fast enough, definitely not
It seems feasible to train the model to recognize the verification code by myself, but I am completely 0 basic

In the end, we chose option 3, using spare time, from the basics, to completely abandoning AI/ML.

After searching for some information, I found that the mainstream method is to use CNN (Convolutional Neural Network), or RNN (Recurrent Neural Network).

Since I am completely unfamiliar with the above two neural networks, I will not expand it here.

In general, to train a model, there are the following typical steps:

collect samples
Clean samples and separate train and test sets
Train the model
Whether the test can actually be used to identify

Let’s go step by step

collect samples

Since we already know how the verification code is generated, here we don’t need to blast the verification code interface of Renren Video Sharing Station to get the verification code (and we still can’t know what the real verification code is) , so there are two paths in front of us, either pre-generate a bunch of samples for training, or use the generator to generate it in real time.

The advantage of the first method is that the graphics card utilization rate is high during training. If you need to adjust the parameters frequently, you can generate it once and use it multiple times; the advantage of the second method is that you do not need to generate a large amount of data, and you can use the CPU during training. Generate data, and as an added bonus, you can generate data indefinitely.

Generate samples in batches

For example, our verification code is yTse , then we generate a yTse.png put it in a directory. The PoC code is as follows:

 from captcha.image import ImageCaptcha import string import re import random import os predefined_str = re . sub ( r "[1l0oOI]" , "" , string . ascii_letters + string . digits ) for i in range ( 10000 ): chars = "" . join ([ random . choice ( predefined_str ) for _ in range ( 4 )]) image = ImageCaptcha () data = image . generate ( chars ) img_path = "./generated/" + chars + ".png" image . write ( chars , img_path )

use generator

The code of ypwhs/captcha_break is directly referenced here, but due to the size of our verification code, some adjustments have been made:

 from tensorflow.keras.utils import Sequence width , height , n_len , n_class = 160 , 60 , 4 , len ( characters ) class CaptchaSequence ( Sequence ): def __init__ ( self , characters , batch_size , steps , n_len = 4 , width = 160 , height = 60 ): self . characters = characters self . batch_size = batch_size self . steps = steps self . n_len = n_len self . width = width self . height = height self . n_class = len ( characters ) self . generator = ImageCaptcha ( width = width , height = height ) def __len__ ( self ): return self . steps def __getitem__ ( self , idx ): X = np . zeros (( self . batch_size , self . height , self . width , 3 ), dtype = np . float32 ) y = [ np . zeros (( self . batch_size , self . n_class ), dtype = np . uint8 ) for i in range ( self . n_len )] for i in range ( self . batch_size ): random_str = '' . join ([ random . choice ( self . characters ) for j in range ( self . n_len )]) X [ i ] = np . array ( self . generator . generate_image ( random_str )) / 255.0 for j , ch in enumerate ( random_str ): y [ j ][ i , :] = 0 y [ j ][ i , self . characters . find ( ch )] = 1 return X , y

Let’s test if this generator works:

It’s easy to use, since the generator can be used to generate verification codes in batches, the first pre-generated solution is simply abandoned here.

Clean samples and separate train and test sets

Thanks to the generator, the training set and the test set are well differentiated, and the traditional train_test_split method is not needed, as long as:

 train_data = CaptchaSequence ( characters , batch_size = 160 , steps = 1000 ) valid_data = CaptchaSequence ( characters , batch_size = 160 , steps = 100 )

That’s it.

Train the model

Since I am completely unfamiliar with neural networks, I continue to refer to the code and description of ypwhs/captcha_break, but since our character set is 56-bit, some adjustments have been made:

The model structure is very simple. The feature extraction part uses two convolutions and a pooling structure. This structure is the structure of VGG16 learned. We repeat five blocks, then we flatten it, connecting four classifiers, each of which is 36 neurons, outputting 36 character probabilities.

 from tensorflow.keras.models import * from tensorflow.keras.layers import * from tensorflow.keras.callbacks import EarlyStopping , CSVLogger , ModelCheckpoint from tensorflow.keras.optimizers import * input_tensor = Input (( height , width , 3 )) x = input_tensor for i , n_cnn in enumerate ([ 2 , 2 , 2 , 2 , 2 ]): for j in range ( n_cnn ): x = Conv2D ( 32 * 2 ** min ( i , 3 ), kernel_size = 3 , padding = 'same' , kernel_initializer = 'he_uniform' )( x ) x = BatchNormalization ()( x ) x = Activation ( 'relu' )( x ) x = MaxPooling2D ( 2 )( x ) x = Flatten ()( x ) x = [ Dense ( n_class , activation = 'softmax' , name = 'c %d ' % ( i + 1 ))( x ) for i in range ( n_len )] model = Model ( inputs = input_tensor , outputs = x ) callbacks = [ EarlyStopping ( patience = 3 ), CSVLogger ( 'cnn.csv' ), ModelCheckpoint ( 'cnn_best.h5' , save_best_only = True )] model . compile ( loss = 'categorical_crossentropy' , optimizer = Adam ( 1e-3 , amsgrad = True ), metrics = [ 'accuracy' ]) model . fit_generator ( train_data , epochs = 100 , validation_data = valid_data , workers = 4 , use_multiprocessing = True , callbacks = callbacks )

After model.fit_generator , the machine will start to automatically adjust the parameters. Since EarlyStopping(patience=3) is set, the epoch here will not reach 100, but will automatically stop after the loss exceeds 3 epochs and does not drop. In order to Speed it up, GPU can be used, but…

I have no money, the only GPU left is a bright card for playing web games

Once again confirmed the statement in the article ” Comparison of the performance differences between different versions of ClickHouse on different CPU architectures “: “Without money, you can’t do scientific research”

After thinking about it, finally decided on the whole Telsa:

Not this Tesla, of course, but…

 Sun May 21 03:32:58 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 | | N/A 55C P0 28W / 70W | 6036MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+

Now that everything is ready, it’s time to be machine-learned.

About 10 minutes per epoch (compared to about 50 minutes per epoch for AMD Ryzen R5 3×00)

At this time, you only have to sit and wait, just like alchemy

Soon, we have a relatively good model cnn_best.h5 :

1000/1000 [==============================] - 536s 534ms/step - loss: 0.1164 - c1_loss: 0.0238 - c2_loss: 0.0320 - c3_loss: 0.0349 - c4_loss: 0.0256 - c1_accuracy: 0.9913 - c2_accuracy: 0.9887 - c3_accuracy: 0.9879 - c4_accuracy: 0.9907 - val_loss: 0.2460 - val_c1_loss: 0.0325 - val_c2_loss: 0.0650 - val_c3_loss: 0.0963 - val_c4_loss: 0.0521 - val_c1_accuracy: 0.9895 - val_c2_accuracy: 0.9793 - val_c3_accuracy: 0.9711 - val_c4_accuracy: 0.9843

Validate the model

After we have the model, we need to download it for verification. This time we directly use the real verification code to test. For example, we can download one from BennyThink’s ” Renren Video Sharing Station “, and then load it locally. Validate after the model:

 from PIL import Image from tensorflow.keras.models import * from tensorflow.keras.layers import * characters = re . sub ( r "[1l0oOI]" , "" , string . ascii_letters + string . digits ) width , height , n_len , n_class = 160 , 60 , 4 , len ( characters ) input_tensor = Input (( 60 , 160 , 3 )) x = input_tensor for i , n_cnn in enumerate ([ 2 , 2 , 2 , 2 , 2 ]): for j in range ( n_cnn ): x = Conv2D ( 32 * 2 ** min ( i , 3 ), kernel_size = 3 , padding = 'same' , kernel_initializer = 'he_uniform' )( x ) x = BatchNormalization ()( x ) x = Activation ( 'relu' )( x ) x = MaxPooling2D ( 2 )( x ) x = Flatten ()( x ) x = [ Dense ( n_class , activation = 'softmax' , name = 'c %d ' % ( i + 1 ))( x ) for i in range ( n_len )] model = Model ( inputs = input_tensor , outputs = x ) model . load_weights ( 'cnn_best.h5' ) # Read index.png to local_data local_data = np . array ( Image . open ( 'index.png' )) / 255.0 plt . imshow ( local_data ) def decode ( y ): y = np . argmax ( np . array ( y ), axis = 2 )[:, 0 ] return '' . join ([ characters [ x ] for x in y ]) y_pred = model . predict ( local_data . reshape ( 1 , * local_data . shape )) print ( "Predicted: " + decode ( y_pred ))

Let’s use the example at the beginning of the article to see the effect?

Now with a little modification of the script, the dream of logging in 100 times in an instant mentioned at the beginning of the article can be realized.

postscript

Since I’m not very good at writing code and I don’t know anything about the neural network part, I have stepped on a lot of pits.

For example, at the beginning, I tried to use a method similar to MNIST to magically change the code of Captcha, so that he only generates a picture of a single (non-interference line) character for individual training. The recognition rate is very, very low.

Then try to use k-neighborhood noise reduction + OpenCV binarization to denoise the complete verification code, and found that the effect is also very general (maybe my level is not good).

By stealing code and copying code, we have achieved from 0 basics to completely abandoning AI/ML. At the same time, we have some understanding of this field in the process of running. Now we can learn the relevant knowledge orthodoxly with questions.

Happy Hacking!

References

This article is reprinted from https://nova.moe/automated-captcha-recognize-with-cnn/
This site is for inclusion only, and the copyright belongs to the original author.