Record a T0 accident

Original link: https://grimoire.cn/intro/t0-925.html

beginning

As you can see, my server has been crashing for two days on September 25th and September 26th, 2022. The reason for this is that on September 23rd, I noticed that the response speed of my blog dropped so fast that every time I opened a page, it took more than 200ms to fully load. After checking the background of my server , I found that my blog has accumulated 500M logs after about a year of use, and it is still growing at a speed visible to the naked eye. And because the backend of this blog was developed by myself early last year, and then transplanted such a theme, until now, the author of this theme has also updated the original theme several times, and typecho is also this year. Version 1.2 was released and it was all tempting me to switch back to typecho . So, at 10 am on September 25, 2022, I officially started the maintenance of the server.

Technical selection

With the increasingly serious commercialization of pagoda panels in recent years, this panel has changed from “multi-functional” to multi-advertising now, and has classified users and established exclusive functions for business users, which cannot be blocked yet. , These made me feel very troubled, and because this panel had a 0day vulnerability and affected a lot of webmasters, so I decided to completely abandon the pagoda panel.

But after losing the panel, I don’t want to increase my maintenance cost, so I choose the docker environment that has been very popular recently, deploy all services through containers, and no longer need the “compile installer” of the pagoda panel and “Ancient Software Version”. Through the docker method, I can quickly update the current program without considering its security.

In terms of reverse proxy server, I did not choose nginx again, because I think that as an ordinary personal blog, there is not too high qps, so it does not need to have higher performance and more complicated configuration, so I choose more The lightweight and more user-friendly caddy is used as the proxy server for this website. This proxy server has a very “good” feature, that is, it can automatically deploy the https certificate without requiring me to operate it myself, which is very worry-free. This is also me The main problem behind the crash.

The blog program is still the official docker image provided by typecho . This image has its own pseudo-static configuration, which is very good.

Program failure

After I took a snapshot backup of the original lightweight cloud server, I started the configuration of the new version of the blog.

I first installed and deployed the docker environment in the new image, then switched to the root user and started to install caddy . Soon, I followed the example given by caddy’s official website and completed my first example site, but this site was I couldn’t access it normally at first, and every time I accessed it, it responded to http2_protoclxxx . It seems that the protocol header is incorrect, but because I don’t know much about the http2 protocol, I don’t even know much about it. Now I feel very anxious. Well, obviously this server can still run normally in my local area, so I had an idea, copied a copy of the local configuration file to the cloud, and then reloaded the server configuration. Great, port 80 is already accessible! I was excited to try to access via port 443 (https), but port 443 still doesn’t work.

可以通过http的方式访问了

In this way, this problem continued until the night, and I still couldn’t access this website through https + domain name, even if I went directly to the ip.

From 8:00 to 10:00 that night, I found that https could be accessed. This discovery made me ecstatic, but when I tried to access Alibaba Cloud’s full-site acceleration (dcdn), I was dumbfounded. , Alibaba Cloud’s dcdn reported a 502 error and didn’t throw any errors for me. It has been 12 hours since the service was offline. I have tried several times to cancel Alibaba Cloud’s site-wide accelerated parsing and try to parse it again, but it is still unavailable. “Let’s go back to the old version with a snapshot”, the thought kept flashing through my mind, and at this moment I was completely numb to the crash.

Until 11:00 pm, I found that when I set the sni back-to-source, dcdn finally successfully resolved to my website, and my page temporarily resumed access!

But this was a flash in the pan, and when I tried to redeploy the container to get my entire website data back, dcdn failed to resolve again. At this point more than 14 hours have passed, so I decided to put up a “service maintenance” sign, and then try to fix it again when I get back from get off work the next day.

当时选用的404模板

Recovery

On the evening of September 26, when I got home, I tried to parse Alibaba Cloud’s dcdn to my server again. Fortunately, it was a one-off success! I was very happy. I replaced multiple devices in a row and tested the network without any problems. When I was able to access the server normally, I finally breathed a sigh of relief.

The next step is to remove the 404 页面, then configure the blog, and finally, after 48 hours of this program failure, my blog finally returned to normal.

This article is reproduced from: https://grimoire.cn/intro/t0-925.html
This site is for inclusion only, and the copyright belongs to the original author.