Original link: https://fugary.com/?p=532
Recently, Douban has updated its anti-crawling policy. Even the cover image is not allowed to be directly accessed. It is becoming more and more strict. The previous calibre-web-douban-api
plug-in cannot obtain the cover normally, so we can only think of other ways to obtain the cover.
The plug-in has been updated to the latest. This article is to record the technical implementation of the plug-in to obtain the cover. Use reference: https://fugary.com/?p=238
Source address: https://github.com/fugary/calibre-web-douban-api/blob/main/src/NewDouban.py
How to achieve
In fact, you only need to specify the header information Refer=https://book.douban.com
to get the cover.
calibre-web
is a third-party open source service. If its source code cannot be modified, two problems need to be dealt with in order to realize Douban cover:
- preview of the cover
- cover download
cover preview
The cover preview is actually a direct request from the browser based on Douban’s cover url
. There is no way to add header information ( Chrome
browser plug-in can do it), so the Douban url
can only be accessed through the local service proxy.
proxy address
calibre-web
is developed with flask
, you can consider introducing the blueprint object of flask
, add a routing address yourself, and forward the request in the routing, refer to the code:
from cps.search_metadata import meta from flask import request, Response @meta.route("/metadata/douban_cover", methods=["GET"]) def proxy_douban_cover(): cover_url = urllib.parse.unquote(request.args.get('cover')) res = requests.get(cover_url, headers=DEFAULT_HEADERS) return Response(res.content, mimetype=res.headers['Content-Type'])
In this way, the cover is displayed through the local /metadata/douban_cover?cover=https://xxxx
address.
service address
With the local proxy service, it is necessary to replace the original Douban cover address with this proxy service when returning the cover data.
It’s a bit troublesome here, because request
object of flask
is implemented through thread local variables, and the metadata service obtains metadata through the thread pool. request
object cannot be obtained in the thread pool, and an error will be reported if you try to obtain it.
{RuntimeError}RuntimeError(‘Working outside of request context.\n\nThis typically means that you attempted to use functionality that needed\nan active HTTP request. Consult the documentation on testing for\ninformation about how to avoid this problem.’)
Therefore, the cover address can only be modified after the metadata is acquired and before it is exported. Here, the replacement of the cover url
address is realized by inheriting MetaRecord
and rewriting the magic function __getattribute__
. Reference code:
@dataclasses.dataclass class DoubanMetaRecord(MetaRecord): def __getattribute__(self, item): # cover通过本地服务代理访问if item == 'cover' and DOUBAN_PROXY_COVER: cover_url = super().__getattribute__(item) if cover_url: try: host_url = DOUBAN_PROXY_COVER_HOST_URL if not host_url and request.host_url: host_url = request.host_url if host_url and host_url not in cover_url: self.cover = host_url + DOUBAN_PROXY_COVER_PATH + urllib.parse.quote(cover_url) except BaseException: pass return super().__getattribute__(item)
In this way, the Douban cover preview function is realized.
If the obtained server address is incorrect, you can manually specify DOUBAN_PROXY_COVER_HOST_URL='http://nas_ip:8083/'
cover download
This local proxy address cannot realize cover download, because if you access other flask
routes again in the flask
route that saves the cover, it will be stuck. It feels because the multi-threaded processing request of flask
is not turned on. In the case of not modifying calibre-web
, to realize cover download, you can override calibre-web
‘s helper.save_cover_from_url
method, replace save_cover_from_url with your own method, and use requests
to download the cover data when it detects that it is the cover address of douban
. The code is as follows:
@staticmethod def hack_helper_cover(): save_cover = helper.save_cover_from_url def new_save_cover(url, book_path): if DOUBAN_COVER_DOMAIN in url: cover_url = url if DOUBAN_PROXY_COVER: component = urllib.parse.urlparse(url) query = urllib.parse.parse_qs(component.query) cover_url = urllib.parse.unquote(query.get('cover')[0]) res = requests.get(cover_url, headers=DEFAULT_HEADERS) return helper.save_cover(res, book_path) else: return save_cover(url, book_path) helper.save_cover_from_url = new_save_cover
This realizes the cover preview and download.
This article is transferred from: https://fugary.com/?p=532
This site is only for collection, and the copyright belongs to the original author.