Building a Web Crawler in 2 Minutes: Easy to do with 0 Basics! | Open Source Daily No.426

Featured image of post 2 分钟构建网页爬虫:0 基础轻松完成! | 开源日报 No.426

getmaxun/maxun

Github Repo Stars
License: `AGPL-3.0`
Language: `Unknown`

demo-picture-of-maxun

maxun is a free open source no-code web data extraction platform. The project solves the problem of users quickly building custom bots to automate data scraping without programming knowledge.

  • Supports rapid training of robots to start automatically crawling web pages in as little as 2 minutes.
  • Provides a variety of data capture methods, including lists, text and screenshots.
  • Suitable for data extraction needs of various websites, such as e-commerce product information.
  • Open source and self-hosted, users can customize as needed.

chronark/highstorm

Github Repo Stars
License: `AGPL-3.0`
Language: `Unknown`

cover

highstorm is an open source event monitoring tool . The project aims to provide efficient event monitoring solution so that users can easily manage and analyze event data in their applications .

  • Support for multiple third-party service integrations, such as databases and authentication
  • Provides detailed installation and configuration guides that are easy to follow
  • With time series database support, suitable for handling dynamic data

huggingface/alignment-handbook

Github Repo Stars
License: `Apache-2.0`
Language: `Unknown`

demo-picture-of-alignment-handbook

alignment-handbook is a project that provides robust recipes for aligning language models with human and AI preferences.

  • Provides a range of robust training formulas that cover the entire process
  • Supports continued pretraining, supervised fine-tuning, and supervised fine-tuning aligned with DPO and ORPO
  • Provides recipes to replicate models such as Zephyr 7B
  • Includes scripts to train and evaluate models and supports distributed training of full model weights
  • Guides are being developed to explain how methods such as DPO work and to share lessons learned when collecting human preferences in practice.

idurar/idurar-erp-crm

Github Repo Stars
License: `AGPL-3.0`
Language: `Unknown`

demo-picture-of-idurar-erp-crm

idurar-erp-crm is an open source ERP/CRM accounting and invoicing software built on advanced MERN stack (Node.js / Express.js / MongoDb / React.js). The program solves the problem of complexity in managing invoices, customers and payments for organizations.

  • Provide comprehensive invoice management, payment management and quote management functions
  • Support customer information management to enhance user experience
  • Friendly and easy-to-use interface based on the Ant Design framework
  • Completely open source and can be used for personal or commercial purposes without cost
  • Self-hosted enterprise version available for flexible deployment

Codium-ai/pr-agent

Github Repo Stars
License: `Apache-2.0`
Language: `Unknown`

demo-picture-of-pr-agent

pr-agent is an AI-based tool for automating pull request analysis, feedback and recommendations. The program is designed to efficiently review and process pull requests, providing AI-driven feedback and recommendations.

  • Provides automated code review and problem identification
  • Supports multiple interface runs, including CLI and PR Comments
  • Ability to enhance PR feedback based on Jira or GitHub tickets
  • Automatic logging of accepted code suggestions for historical tracking and learning
  • Allow users to customize label generation to meet project needs