How to Archive a DokuWiki Site (1st Archiving Marathon)

Original link: https://blog.save-web.org/blog/2023/04/16/%E5%A6%82%E4%BD%95%E5%AD%98%E6%A1%A3-dokuwiki- %E7%AB%99%E7%82%B9%EF%BC%88%E7%AC%AC%E4%B8%80%E5%B1%8A%E5%AD%98%E6%A1%A3%E9 %A9%AC%E6%8B%89%E6%9D%BE%E6%B4%BB%E5%8A%A8%EF%BC%89/

If you encounter problems, please give us feedback

Preparation

event registration

Reply “1” in the comment area of ​​https://t.me/saveweb/116 , and then join the group: https://t.me/saveweb_projects/120

Internet Archive

First go to IA to register an account https://archive.org/account/signup

Then go to https://archive.org/account/s3.php to generate and write down your S3-Like API access key and secret key .

Install Python and dokuWikiDumper

dokuWikiDumper requires Python 3.8+. If you are a Windows user, you can just download an old version from the Python official website.

After installing Python, run pip install dokuWikiDumper , well, dokuWikiDumper is installed. (Whether you want to pretend in the virtual environment depends on your mood) (If pip is not installed, install it, there are tutorials on the Internet)

Install 7zip and add to environment variables (Windows)

Most Linux distributions should come with a 7zip package (the package name may also be called 7z ).

  • Go to the 7-zip official website to download 7-zip and install it.
  • After installation, add the installation directory to the PATH environment variable (usually C:\Program Files\7-Zip ), and then restart the terminal.

recieve the task

The marathon hasn’t started yet, you can’t get the task now
The marathon hasn’t started yet, you can’t get the task now

Go to https://github.com/orgs/saveweb/projects/4/views/2 to see which DokuWiki does not have Assignees (assignees), and then go to the corresponding issue comment area or shout directly in the TG group and say yourself Claim this DokuWiki. (Or shout directly in the TG group: “Get a DokuWiki”, and then we will send you the url)

save and save

Suppose we get the task of http://wwwiki.top/ this wiki. Open it up and take a quick look:

  1. Can you register an account, if you can, try to register and log in
  2. What is its doku.php URL

    Just click some function keys, such as登陆, and find that the URL of the login page is http://wwwiki.top/doku.php?id=前端技术记录&do=login&sectok= , the URL before the question mark is the URL of doku.php , that is : http://wwwiki.top/doku.php . (but not necessarily)
    doku.php address generally ends with doku.php , but some websites will customize the URL rewriting configuration. For example, DokuWiki officially uses https://www.dokuwiki.org/dokuwiki as the doku.php address.
    Some sites will set the root directory to the location of doku.php .

  3. Can the site map (entry index) be opened

    For example, http://wwwiki.top/doku.php?id=front-end technical records&do=index , it seems that there is nothing wrong with it.

  4. Just click on an entry, click编辑/显示源文件on the right sidebar (or add the do=edit parameter in the URL), and see if you can see the source file ( wikitext ) in visitor mode.

    For example, http://wwwiki.top/doku.php?id=front-end technical records&do=edit , it seems that there is nothing wrong with it.
    If you can’t see wikitext, try to see if you can see it after logging in.
    If you can’t see wikitext after logging in, don’t worry too much, we can still save the rendered HTML, replace &do=edit in the URL with &do=export_html , you should find that the website gives you a clean HTML of this entry Render the result.

Okay, now that we have roughly figured out the situation of this site, let’s start saving.

cd to the working directory you want, and enter directly on the command line:

 dokuWikiDumper <这个站的首页URL> --auto

Then it starts downloading the wiki.

If there is a problem in the process of archiving, please give feedback, the situation of different websites is different, some need special treatment, and some need to be abandoned.

?


Note: --auto is equivalent to using the following parameters

 --content --media --html # 存wikitext,存媒体文件,存渲染后的HTML --threads 5 # 开5 个线程存--ignore-action-disabled-edit # 如果网站不允许我们查看某个条目的wikitext ,不报错退出,继续尝试下一个条目

Note: common parameters:

 --insecure # 禁用SSL 验证--username USERNAME # 登陆DokuWIki 站点的帐号(为避免转义,最好不要用特殊字符) --password PASSWORD # 登陆DokuWIki 站点的密码(为避免转义,最好不要用特殊字符) --cookies COOKIES # Cookies ,用法看README

common error

  • requests report incompleteread . If you hang up the ladder, this problem will occur when the ladder automatically switches nodes or the node is unstable and loses packets, just run the wrong command again.
  • When saving content (wikitext), a large number of ignore are reported. Generally, the website does not allow you to view wikitext, but if the corresponding entry allows you to view wikitext and still reports an error, it may be that dokuWIkiDumper has not detected the correct doku.php URL, try running dokuWIkiDumper <doku.php 的URL> --各种参数... instead of the home page URL.
  • Always in Waiting for x threads to finish . Wait, if you can’t wait, run again, it doesn’t matter. (If you know what you’re doing, you can manually skip completed progress with –skip-to NUM when rerunning). If the re-run is still stuck in the same place, feedback.
  • OSError: Unable to create file . You should be using Windows, probably encountered special characters not allowed by NTFS. If you don’t know how to solve it, feedback.

examine

After dokuWikiDumper finishes running, it should create a wikidump directory in <site>-<date> format (eg wwwiki.top-20230416 ) in the current directory. You may be worried, you can check whether the wikidump is normal according to the directory structure .

Upload to Internet Archive

Preparation

Create an ia_keys.txt file (name it whatever you want). Put the keys of IA S3-Like API in it. The first line is access key , and the second line is secret key .

upload

run:

 dokuWikiUploader <wikidump 的目录> -kf <ia_keys.txt 的位置>

For example

 dokuWikiUploader wwwiki.top-20230416/ -kf ia_keys.txt

Then it will be automatically packaged and uploaded to IA.

The default ia_keys.txt file path of dokuWikiUploader is ~/.doku_uploader_ia_keys , after setting here, you don’t need to enter --kf every time.

After the upload is complete, the command line will finally output a link (item link) similar to https://archive.org/details/wiki-xxxxxx-20230000 , which means the upload is complete. Then send us this item link, we will give you +1 for your marathon score.
?

You can log in to archive.org and click the avatar in the upper right corner of the homepage, and you can see your previously uploaded items in My Uploads .

common error

  • Your upload of xxxxx from username xxxxx appears to be spam. If you believe this is a mistake, contact [email protected] and include this entire message in your email. by IA’s Anti-SPAM, don’t rush to contact IA, Feedback to us first, and we will give feedback to IA in a unified manner.

This article is reproduced from: https://blog.save-web.org/blog/2023/04/16/%E5%A6%82%E4%BD%95%E5%AD%98%E6%A1%A3-dokuwiki- %E7%AB%99%E7%82%B9%EF%BC%88%E7%AC%AC%E4%B8%80%E5%B1%8A%E5%AD%98%E6%A1%A3%E9 %A9%AC%E6%8B%89%E6%9D%BE%E6%B4%BB%E5%8A%A8%EF%BC%89/
This site is only for collection, and the copyright belongs to the original author.