Optimize Python+Flask access speed, ElasticSearch+Redis caching strategy tuning

Original link: https://tstrs.me/result/f9lK7oYB_oK7ec3FjaMv

foreword

When building this site , I used ElasticSearch as the database and Redis as the cache component. It was used without any problems at first, but later due to version iterations, the addition of modules and the continuous improvement of the page. Data in many places needs to be queried to Redis for filling, and even some pages need to be queried dozens or hundreds of times, resulting in slower loading speed and increased system load. The main content of this article is to fix website bugs and optimize system load and caching strategy tuning.

System module introduction

Text content

Because I didn’t intend to import old blog articles into the new system at the beginning, I designed a new page link format, that is: \result\文章ID , because this ID is unique and its length is fixed at 20 digits , so the key-value pair (KV) I designed in the cache system is the article ID: article content, for example:

 hXia44YBlyC2E8nCuWW5:[文章正文内容]

The advantage of doing this is that it is very simple and fast. When you need to check whether a certain article is cached, you can directly check the ID.

recommended reading

Later, [Recommended Reading] was added to the side of the page. At first, I used ElasticSearch popularity query. Later, I felt that the popularity list was basically unchanging and meaningless, so I changed it to random return, so that every refresh would change. But every time I query ElasticSearch, the speed is relatively slow, so I use the randomkey function of Redis, which can return a key at random, and then I can get the detailed content of the article according to the key.

First of all, the content recommended here cannot be repeated. Secondly, because my website is bilingual in Chinese and English, I can’t recommend English articles on the Chinese page, and vice versa.

So I used a while loop in this module, and a total of 10 items are needed. Every time I get it, I will query it immediately after returning to judge whether the language is consistent with the content of the text. If it is consistent, I will add it and count +1 until the count reaches 10. Some people may not understand, the following is the module code of this part, input L as language:

 def get_randomkey_redis(l):#随机获取文章
 id_list = []#id
 raw_info_list = []#详细内容
 while len(raw_info_list) one_pages_id = str(page_cache.randomkey(), 'utf-8')#从redis 随机获取一个数据返回，并字节转字符串
 if one_pages_id not in id_list :#不在列表内
pcs = get_pc(one_pages_id)#获取文章详情
if pcs['language'] == l :#语言一致
 a = {['文章详情']}#构造返回
 raw_info_list.append(a)#将构造好的返回打包到组里面
 id_list.append(one_pages_id)#计数
 return raw_info_list

First of all, the reason for the language here is that you need to query each time to know whether this piece of data can be used. Basically, it takes 30-40 times for each query to complete the construction and return. However, even so, the speed is still faster than using it directly. ElasticSearch is fast.

look at other

Now I have added [Look at others] at the bottom of the article page, which is different from the sidebar and is in the style of a card. And at that time, for the convenience and quickness of the sidebar, the return value only had four fields:名称，链接，时间，热度, and it had two more fields, which were预览图，简介. I made a separate module for this, but The code part is basically the same with only a few changes.

Although the interface is good-looking and the content is rich, due to too many queries, each page basically takes 80-100ms to output, which is unacceptable to me, and frequent IO read and write will It makes the system freeze, which makes the inherently slow access speed even slower.

The above are the pits I dug before, and I have recently optimized all of them. Share the solution ideas and actual code below.

BUG solution

First of all, this problem boils down to the fact that the overall architecture cannot keep up with the needs of subsequent version iterations, and some core modules need to be refactored to solve this problem. In this regard, I directly lifted the table and started again. Instead of optimizing the old code, it is better to refactor the mess quickly.

According to the 10 items on the side and 6 items on the bottom of my article details page, there are 16 sets of information in total. Directly obtain 16 random returns of the corresponding language in the ElasticSearch database, and each return only obtains 6 records, namely:名称，链接，时间，热度，预览图，简介, which saves memory. The query code is as follows:

 es.search(index="why", body={"query":{"bool":{"must":{"multi_match":{"query":'tttt',"fields":['so']}},"filter":{"match":{"language":l}} ,"must_not":{"match":{"edit":'编辑中'}}}},"from":0,"size":16,"sort": {"_script": {"script": "Math.random()","type": "number"}}})

The translation into adult words is [Give me an article whose query language is L and is not being edited from the database, and randomly return 16 pieces of data]. tttt in query is the general query parameter I set. In order to fundamentally solve the language problem, I directly separate the Chinese and English Redis libraries, so that the query time will not be wasted.

Write the above query content to Redis. I use the current time as the key key. I don’t need it anyway, just the only one.

 def set_rdm(l):#给redis添加一组缓存，对应相应的语言
 if l == 'zh':
 zh_rdm_cache.set(round(time.time()),json.dumps(es_act._random(l)),ex=3600)
 elif l == 'en':
 en_rdm_cache.set(round(time.time()),json.dumps(es_act._random(l)),ex=3600)

Reading is also very simple, use the following code to read the recommended reading in the corresponding language.

 def get_rdm(l):#从redis 中获取一组随机返回，对应相应语言
 if l == 'zh':
 return json.loads(zh_rdm_cache.get(zh_rdm_cache.randomkey()))
 elif l == 'en':
 return json.loads(en_rdm_cache.get(en_rdm_cache.randomkey()))

After this operation, each page only needs to query the text once and make a random recommendation once, and the total time consumption can basically be within 10ms.

postscript

So, the simpler things are, the faster they are, and it is easy to waste performance one by one. You may not feel any change on a server with strong performance, but a little optimization on a server with a relatively low configuration will do. bring about a huge improvement.

This article is transferred from: https://tstrs.me/result/f9lK7oYB_oK7ec3FjaMv
This site is only for collection, and the copyright belongs to the original author.