Continue to learn about neural networks

Original link: https://blog.codingnow.com/2023/03/cnn.html

During this period of time, apart from being busy with game projects, I continued to spend a few days learning neural networks.

After the last blog was published, I received an email from Turing’s friend. Then, I received a bunch (nine) of Turing’s books on artificial intelligence in Chinese. Usually, my classmates from the publishing house mail me new books, more or less hoping that I can write something. I don’t want to deliberately help people promote new books, but I don’t mind recommending some things that I have read a lot myself.

I received too many books this time, and of course I didn’t read them all. But the content of the book is very relevant (almost a theme). Last year, I read ” How to Read a Book “, which mentioned four levels of reading. The fourth level is that it is very meaningful to read multiple books on a topic at the same time. This time, I also gained a lot.

In this pile of books, an introduction to neural networks. In addition to the free English book mentioned in the previous blog ( there is already a Chinese version of “Neural Network and Deep Learning in Simple and Simple” ), I also recommend ” Introduction to Deep Learning ” written by the Japanese. I think Japanese popular science introductory books are very detailed, with a lot of illustrations. Sure enough, this book added many details that I had overlooked in the previous book. Especially the error backpropagation algorithm, I think the context is clearer after reading it. btw, when I recommended this book on twitter (without writing the title, only a photo), I also received a reply from the author Saito Yasuhiro. the world is so small ?

In the implementation link, the small goal I set for myself is to practice the convolutional network. But the practical codes in several books no longer have very simple implementations like the simple fully connected network before. I still plan to write it completely from scratch (without any library including mathematical vector calculations). This time there is not much code to refer to, so it is more difficult. It’s not that the code will be particularly complicated, but if there is a bug in any detail, it will be difficult to trace without a reference. It is necessary to make the principle clearer, so as to make fewer mistakes when writing code, and it is easy to trigger your own intuition to quickly locate bugs when you make a mistake.

The first thing I discovered was that I didn’t understand some basic concepts before. Caused older versions to have quite a few wrong abstractions. It is difficult to extend a convolutional network on an existing abstraction.

For example, each connection between fully connected layers has a weight that affects signal propagation and a bias. From a computational point of view, an affine transformation is actually performed on the signal. I think the weights and biases look like a whole. But actually they should be two things.

If you only look at a neuron in the fully connected layer, the weight is configured on the connection from its upstream neuron; and the bias is placed on itself. The bias is more like another independent constant signal source. Disassembled in this way, you can better understand the error backpropagation: the influence of weight and bias is propagated separately.

The same is true for activation functions: we need different activation functions for different occasions. sigmoid, relu, and tanh have their respective applications. They backpropagate the error, but have no parameters to tune.

I took some time to rewrite the old code before implementing the convolutional network. The rewritten version does not do a good abstraction, it just works. Because I feel that as the study progresses, it may be changed at any time. There is no need to introduce too much complexity too early.

About Convolutional Networks. In fact, it has nothing to do with the structure of neural networks in biology. It just uses the mature technology in signal processing to extract the features on the 2D image through image convolution operation. In this way, the input parameters of the neural network will be more reasonable (instead of isolated pixels). It can be understood as extracting image information from a higher dimension and handing it over to the neural network for classification.

However, the principle of signal propagation in the convolutional layer and the use of error backpropagation to adjust parameters is similar to that of neural networks. So we still put it in the neural network. For implementation, forward reasoning is a common image convolution operation in graphics; what I need this time is to find the correct error backpropagation algorithm.

If you understand the chain rule, it is not difficult to derive it yourself. After deriving it myself, in order to avoid mistakes, I googled a lot of related articles. I think the better one is this one from Microsoft Research: ” The Backpropagation Principle of Convolution “. As for the simpler error backpropagation of the pooling layer, you can also refer to: Forward and Backward propagation of Max Pooling Layer in Convolutional Neural Networks .

Simply put, for mnist handwritten digit recognition. We’re looking to use convolutional layers to find different features in these images. Horizontal and vertical strokes, intersections, bends, and more. Convolution has good translation invariance, as long as a certain feature is found, it does not matter where the feature appears in the image. Finally, use these features as a signal source to judge what the specific number is, which is more efficient than judging based on pixels.

But what kind of filter (convolution kernel) to use to find what image features we don’t know in advance. The job of the circular convolutional layer is to start from random numbers to find what kind of convolution kernel is more appropriate.

ps. Talk about why you want to initialize the starting point of learning from random numbers (instead of zero). I found the answer in the book these days: If the parameters we want to train are all the same at the beginning, they will be difficult to separate from each other. Because essentially the neurons in the same layer of the neural network are equivalent. In the beginning, it was not specified which neuron was responsible for which job. It is training that biases them towards different responsibilities. If the initial values ​​are the same at the beginning, it will be difficult to separate the training paths. If it is a multi-layer network, we also need to adjust the distribution parameters of random initial values ​​​​at different levels to obtain better results.

Writing a convolutional layer from scratch is easy to make mistakes. I don’t expect to get it right the first time. So I wrote a separate test code to test the error backpropagation algorithm of the convolution operation. That is to solve the problem: if I have a picture and the target pattern obtained by image convolution operation through a certain filter, how to find out what the filter is. Through this test, several small bugs in these codes were found. Without this test step, I am afraid that the newly written convolutional layer will be directly incorporated into the neural network, and there will be no way to check for errors.

The final convolutional neural network works well. The previous accuracy rate of about 95% has been increased to nearly 99%. According to the knowledge learned from the book, the next step is to improve some details of the learning process, which should be further improved to more than 99%.


However, after adding the convolutional layer, the training cost has also increased significantly. The previous fully connected feed-forward neural network only takes a minute or two to train once. Now it takes about 20 minutes. Naturally, the learning process works best if there is fast data feedback. Half an hour is probably the most annoying limit.

The mini batch method currently used in my implementation is to train a small batch of data in one cycle, and the training of each group of data in this small batch is completely independent. After a batch is completed, the accumulated structure is iterated into the model to be trained. This can naturally be split into different threads. If you spend a little effort to implement a parallel version under skynet or ltask , I think it can increase the speed by an order of magnitude. However, this seems to have little meaning for learning the neural network itself. After all, it is more profitable to move the calculation to the GPU.

This article is transferred from: https://blog.codingnow.com/2023/03/cnn.html
This site is only for collection, and the copyright belongs to the original author.