STWP 2023 Week 14 Report

Original link: https://blog.save-web.org/blog/2023/04/11/stwp-2023-%E7%AC%AC-14-%E5%91%A8%E5%91%A8%E6 %8A%A5/

For STWP not long coo coo ? , we will publish a weekly project progress newsletter.


Week 14 project summary.

  • @jsun969 is trying to write frontend for uglysearch.othing.xyz: https://github.com/saveweb/saveweb-search-frontend
  • saveweb/review-2022 included +1
  • Podcast archive project started, write archive tool: https://ift.tt/mxABOCg
  • The podcast archive tool is GA, and tried to archive 30 podcasts, taking up 140GiB. (The archive size will be expanded later)
  • Pull the database of the archive server corresponding to the archive plan of a certain domestic application market back to the local, and make a sub-database.
  • Capture packets and explore the API of the well-known domestic podcast application “Little Universe”.
  • The wikiiteam bot on wikiapiary.com has been down for 8 years. In the past 8 years, the APIs of wikiapiary, IA, pywikibot, MediaWiki, WikiTeam and other software or services have changed, and the original bot script in wikiteam/wikiteam repo can no longer be used.
    So I wrote a new bot and tried to revive it.
    https://github.com/saveweb/wikiapiary-wikiteam-bot
    We are trying to contact the original wikiteam bot account holder. If we can’t, we will contact wikiapiary to apply for a bot account to run by ourselves. ?

Summary of recent discussions:

  1. https://github.com/saveweb/see-agreement/ This project has been on hold…
  2. Nikkei Discussion: Tucao network, wailing for lack of storage space, and complaining about SSD lifespan.
  3. https://www.podcastrepublic.net/ can be used as a scraping source for blog archive items.
  4. Mac software: Little Snitch Network Monitor’s traffic visualization is kinda cool.
  5. xuite.net “Xuite Random Nest” blogging platform will be shut down.

Next work/to-do list:

  1. Need to continue to maintain rss-list, lack of people.
  2. For the floppy disk archiving project, write the specific archiving process and method (write a manual, and possibly record a video).
  3. mediawiki archive related:
    1. Optimize the launcher.py of wikiteam3, only put history.xml into a compressed package.
    2. Deprecated wikiteam3’s “feature” of downloading a .desc file for each media file.
    3. Flow analysis of wikidump xml generated by wikiteam3, as an xml validator.
    4. Write a small script that saves fandom wiki comments. (wikiteam#456)
  4. Do a DokuWiki archive. Small goal: Complete 100 DokuWiki archives (more than 20 are currently saved). It is best to attract “international friends” to participate (currently there is only one).
  5. @jsun969 blogging search engine front end.
  6. Connect MariaDB of FreshRSS with MeiliSearch, so that the full-text index of the blog search engine can be updated in real time. (Currently importing the entire library manually)
  7. @oveRidea_China develops BiliBili’s daily Top 100 video archive.
  8. Keep exploring: Ways to archive podcasts.

Googoo (suspended) items:

  1. see-agreement (collect the user agreement, privacy agreement, etc. of each website/software)
  2. Internet Cemetery wiki (documenting closed sites and services)
  3. Tianya Forum archives (metadata crawling has not been done yet, metadata crawling through the web does not work, you have to use the API)
  4. Git blog repository archive (management required, shut down)
  5. Yuque public knowledge base archives (now Yuque can open public knowledge bases are paying users, and archiving paid users feels a bit…not interesting)

This article is reproduced from: https://blog.save-web.org/blog/2023/04/11/stwp-2023-%E7%AC%AC-14-%E5%91%A8%E5%91%A8%E6 %8A%A5/
This site is only for collection, and the copyright belongs to the original author.