Original link: https://blog.save-web.org/blog/2023/04/11/stwp-2023-%E7%AC%AC-14-%E5%91%A8%E5%91%A8%E6 %8A%A5/
For STWP not long coo coo , we will publish a weekly project progress newsletter.
Week 14 project summary.
- @jsun969 is trying to write frontend for uglysearch.othing.xyz: https://github.com/saveweb/saveweb-search-frontend
- saveweb/review-2022 included +1
- Podcast archive project started, write archive tool: https://ift.tt/mxABOCg
- The podcast archive tool is GA, and tried to archive 30 podcasts, taking up 140GiB. (The archive size will be expanded later)
- Pull the database of the archive server corresponding to the archive plan of a certain domestic application market back to the local, and make a sub-database.
- Capture packets and explore the API of the well-known domestic podcast application “Little Universe”.
- The wikiiteam bot on wikiapiary.com has been down for 8 years. In the past 8 years, the APIs of wikiapiary, IA, pywikibot, MediaWiki, WikiTeam and other software or services have changed, and the original bot script in wikiteam/wikiteam repo can no longer be used.
So I wrote a new bot and tried to revive it.
https://github.com/saveweb/wikiapiary-wikiteam-bot
We are trying to contact the original wikiteam bot account holder. If we can’t, we will contact wikiapiary to apply for a bot account to run by ourselves.
Summary of recent discussions:
- https://github.com/saveweb/see-agreement/ This project has been on hold…
- Nikkei Discussion: Tucao network, wailing for lack of storage space, and complaining about SSD lifespan.
- https://www.podcastrepublic.net/ can be used as a scraping source for blog archive items.
- Mac software: Little Snitch Network Monitor’s traffic visualization is kinda cool.
- xuite.net “Xuite Random Nest” blogging platform will be shut down.
Next work/to-do list:
- Need to continue to maintain rss-list, lack of people.
- For the floppy disk archiving project, write the specific archiving process and method (write a manual, and possibly record a video).
- mediawiki archive related:
- Optimize the launcher.py of wikiteam3, only put history.xml into a compressed package.
- Deprecated wikiteam3’s “feature” of downloading a .desc file for each media file.
- Flow analysis of wikidump xml generated by wikiteam3, as an xml validator.
- Write a small script that saves fandom wiki comments. (wikiteam#456)
- Do a DokuWiki archive. Small goal: Complete 100 DokuWiki archives (more than 20 are currently saved). It is best to attract “international friends” to participate (currently there is only one).
- @jsun969 blogging search engine front end.
- Connect MariaDB of FreshRSS with MeiliSearch, so that the full-text index of the blog search engine can be updated in real time. (Currently importing the entire library manually)
- @oveRidea_China develops BiliBili’s daily Top 100 video archive.
- Keep exploring: Ways to archive podcasts.
Googoo (suspended) items:
- see-agreement (collect the user agreement, privacy agreement, etc. of each website/software)
- Internet Cemetery wiki (documenting closed sites and services)
- Tianya Forum archives (metadata crawling has not been done yet, metadata crawling through the web does not work, you have to use the API)
- Git blog repository archive (management required, shut down)
- Yuque public knowledge base archives (now Yuque can open public knowledge bases are paying users, and archiving paid users feels a bit…not interesting)
This article is reproduced from: https://blog.save-web.org/blog/2023/04/11/stwp-2023-%E7%AC%AC-14-%E5%91%A8%E5%91%A8%E6 %8A%A5/
This site is only for collection, and the copyright belongs to the original author.