One skill a day: Bug analysis, the problem that false deletion leads to successful article publishing but cannot be opened

Original link: https://www.kingname.info/2022/06/20/fake-delete/

The company has an internal blog, where you can create your own account and write articles to share throughout the company. Yesterday this internal blog opened the API, so I am going to write a Python program and upload all the articles on my official account.

Then I found out that there is a bug in this API interface. And according to its phenomenon, guess where the problem is.

Let me briefly describe the phenomenon first.

Let’s say I now have 50 Markdown files on my hard drive. Now I’m going to publish it on the website. The simplified code is as follows:

 1
2
3
4
5
6
7
 import glob
import requests

for path in glob.glob( 'blog/*.md' ):
with open(path) as f:
article = f.read()
requests.post( 'https://xxx.yyy.com/post?token=abcasdf' , json={ 'content' : content})

After the publication is completed, the articles have indeed appeared on the web page, and each article can be displayed normally . But I took a cursory glance and found that there are some articles with the QR code of my WeChat public account at the end. I don’t want people in the company to know my official account, so I’m going to revise the article.

Some articles have QR codes, some do not, and it is very troublesome to change them one by one, so I did a two-step operation. First, I wrote a program that scans all Markdown files and deletes the QR code when I find it. Then, I directly deleted all the articles I just published on the website (too lazy to see which one has a QR code and which one doesn’t, so I just deleted and reposted them all).

Next, I run the program again to republish the articles in bulk. After 2 seconds, the publishing is completed.

Everything seemed to be normal at first, but when I went to the website to check, I found that after clicking on many articles, it prompted “This article has been deleted”.

At first I wondered if my program was written incorrectly and missed these articles. I republished this article one by one, and the API interface returned that the publication was successful, but it still shows that the article has been deleted on the web page.

Then I checked these articles that failed to publish one by one, and found that they had one common feature: they were articles without QR codes in the first place. It is equivalent to reposting these articles after I deleted them from the website.

Then I have a preliminary guess, and probably know what the reason is:

  1. Because each article has a docid, when the article is first published, this docid is the md5 value of the body of the article. As long as the article is exactly the same, its docid will be the same no matter how many times it is posted in a row. This will prevent duplicate articles from appearing. (When updating, the user is required to actively provide the docid to avoid regenerating a new one).
  2. The delete function of this website is definitely a fake delete. That is, when I click the button to delete the article, the article is still in the database, but a field removed=True is added. When the webpage displays articles, the query condition must be col.find({'removed': {"$ne": True}}) , so these soft-deleted articles will not be displayed.
  3. When the API publishes a new article, it must use the update operation. and upsert=True is used.

Taking MongoDB as an example, the logic behind this API must be this:

 1
2
 def post_article (docid, article_info) :
mongo.update_one({ '_id' : docid}, { '$set' : article_info}, upsert= True )

The function of upsert=True is to first check whether the data exists, update it if it exists, and insert it if it does not exist.

When it was first published, the article did not exist, and it was inserted directly, normal. If the user uses the modification interface normally and modifies the text, because the user actively provides the docid, it can also be updated normally.

But if the user deletes the data first, a field removed=True is added to the database at this time. The user then reposts the article as it is. So the docid is definitely the same as the original one. This article already exists in the database. So each field is updated one by one. However, there is no removed field in the newly released field, so it will not be updated when it is updated, and it is still in the database. So it appeared that the release was successful, but when the news was opened, it was prompted that the article had been deleted.

I asked the classmates who made this API, and sure enough, the reason for its bug is exactly the same as I imagined.

The solution to this bug is very simple. When publishing a new article, just change update_one to replace_one :

 1
2
 def post_article (docid, article_info) :
mongo.replace_one({ '_id' : docid}, { '$set' : article_info}, upsert= True )

This article is reprinted from: https://www.kingname.info/2022/06/20/fake-delete/
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment