Tips for extracting Blogger blog post address

Original link: https://www.williamlong.info/archives/6907.html

Blogger.jpg

For most Chinese blogs, the URLs of blog posts published by Google Blogger are randomly generated and irregular. How to manage these irregular URL addresses is a difficult problem for website administrators. Here is how to extract Google Blogger. A list of URL addresses of blog posts published by Blogger.

It should be noted that there is an important premise for obtaining the address, that is, do not update the blog post during the operation. Once the new blog post is updated and released, the operation described in this article will be completely useless and need to be done from scratch.

The main way to get it is through Google Blogger’s sitemap.xml. Visit the sitemap.xml file of the blog address, you can see an xml list, which contains a series of xml files, the file names increase from sitemap.xml?page=1 in turn, the number of files in each sub-file is 150, and each address is manually From sitemap.xml?page=1 download to the end of the page, you can get N xml files.

These xml files are the URL addresses published by Google Blogger in chronological order. You can use Microsoft Excel to open these xml files in turn, and copy the URLs in the first column. You can copy 150 URLs at a time.

If there are thousands of blog posts, this operation can be done dozens of times. However, if there are too many posts, tens of thousands, I am afraid this operation will take a lot of time.

This article is reprinted from: https://www.williamlong.info/archives/6907.html
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment