Update the calibre-web Douban plug-in calibre-web-douban-api

Original link: https://fugary.com/?p=532

Recently, Douban has updated its anti-crawling policy. Even the cover image is not allowed to be directly accessed. It is becoming more and more strict. The previous calibre-web-douban-api plug-in cannot obtain the cover normally, so we can only think of other ways to obtain the cover.

The plug-in has been updated to the latest. This article is to record the technical implementation of the plug-in to obtain the cover. Use reference: https://fugary.com/?p=238

Source address: https://github.com/fugary/calibre-web-douban-api/blob/main/src/NewDouban.py

How to achieve

In fact, you only need to specify the header information Refer=https://book.douban.com to get the cover.

calibre-web is a third-party open source service. If its source code cannot be modified, two problems need to be dealt with in order to realize Douban cover:

  1. preview of the cover
  2. cover download

cover preview

The cover preview is actually a direct request from the browser based on Douban’s cover url . There is no way to add header information ( Chrome browser plug-in can do it), so the Douban url can only be accessed through the local service proxy.

proxy address

calibre-web is developed with flask , you can consider introducing the blueprint object of flask , add a routing address yourself, and forward the request in the routing, refer to the code:

 from cps.search_metadata import meta from flask import request, Response @meta.route("/metadata/douban_cover", methods=["GET"]) def proxy_douban_cover(): cover_url = urllib.parse.unquote(request.args.get('cover')) res = requests.get(cover_url, headers=DEFAULT_HEADERS) return Response(res.content, mimetype=res.headers['Content-Type'])

In this way, the cover is displayed through the local /metadata/douban_cover?cover=https://xxxx address.

service address

With the local proxy service, it is necessary to replace the original Douban cover address with this proxy service when returning the cover data.

It’s a bit troublesome here, because request object of flask is implemented through thread local variables, and the metadata service obtains metadata through the thread pool. request object cannot be obtained in the thread pool, and an error will be reported if you try to obtain it.

{RuntimeError}RuntimeError(‘Working outside of request context.\n\nThis typically means that you attempted to use functionality that needed\nan active HTTP request. Consult the documentation on testing for\ninformation about how to avoid this problem.’)

Therefore, the cover address can only be modified after the metadata is acquired and before it is exported. Here, the replacement of the cover url address is realized by inheriting MetaRecord and rewriting the magic function __getattribute__ . Reference code:

 @dataclasses.dataclass class DoubanMetaRecord(MetaRecord): def __getattribute__(self, item): # cover通过本地服务代理访问if item == 'cover' and DOUBAN_PROXY_COVER: cover_url = super().__getattribute__(item) if cover_url: try: host_url = DOUBAN_PROXY_COVER_HOST_URL if not host_url and request.host_url: host_url = request.host_url if host_url and host_url not in cover_url: self.cover = host_url + DOUBAN_PROXY_COVER_PATH + urllib.parse.quote(cover_url) except BaseException: pass return super().__getattribute__(item)

In this way, the Douban cover preview function is realized.

If the obtained server address is incorrect, you can manually specify DOUBAN_PROXY_COVER_HOST_URL='http://nas_ip:8083/'

cover download

This local proxy address cannot realize cover download, because if you access other flask routes again in the flask route that saves the cover, it will be stuck. It feels because the multi-threaded processing request of flask is not turned on. In the case of not modifying calibre-web , to realize cover download, you can override calibre-web ‘s helper.save_cover_from_url method, replace save_cover_from_url with your own method, and use requests to download the cover data when it detects that it is the cover address of douban . The code is as follows:

 @staticmethod def hack_helper_cover(): save_cover = helper.save_cover_from_url def new_save_cover(url, book_path): if DOUBAN_COVER_DOMAIN in url: cover_url = url if DOUBAN_PROXY_COVER: component = urllib.parse.urlparse(url) query = urllib.parse.parse_qs(component.query) cover_url = urllib.parse.unquote(query.get('cover')[0]) res = requests.get(cover_url, headers=DEFAULT_HEADERS) return helper.save_cover(res, book_path) else: return save_cover(url, book_path) helper.save_cover_from_url = new_save_cover

This realizes the cover preview and download.

image-20230716101558920

This article is transferred from: https://fugary.com/?p=532
This site is only for collection, and the copyright belongs to the original author.