2022-25: Open source should be upstream first

Original link: https://xuanwo.io/reports/2022-25/

Upstream First is an important concept in open source collaboration. Today, we will talk about why upstream first is adopted and how to do it with examples.

upstream first

Software projects inevitably depend on other projects and are depended on by other projects. The dependencies of this project are called Upstream, and other projects that depend on this project are called Downstream. In the process of maintaining a project, upstream changes may need to be made for various reasons. Common situations include:

  • A breaking change was made upstream
  • There is an unfixed bug upstream
  • The upstream lacks the functional features required by this project

The decision-making orientation in which these changes are fed back to the upstream is called upstream-first.

In theory, open source collaboration naturally needs to feed back changes to the community (some are even mandated by the protocol), but in practice, in order to maintain their own short-term interests, unfamiliar with the open source collaboration process, short-term goal-driven KPI pressure and other reasons, many Changes are not reported upstream, resulting in this anti-pattern (from @tison’s twitter ):

Adopting upstream priority can help reduce the maintenance burden and negative assets caused by our inability to follow up community updates after our internal magic reform.

I think upstream priority mainly has the following work:

  • renew
  • communicate
  • contribute

renew

Keeping updated is the first step in upstream priority, generally choose the latest stable version or long-term maintenance version released upstream. Keeping updated is essentially maintaining trust in the upstream maintainer: I believe upstream is able to deliver high-quality software, this update will bring new feature improvements and bug fixes and keep his stability promises.

There is a wonderful discussion in the classic book “Evolution of Cooperation” recommended by tison : the author takes the two rounds of the “repeated prisoner’s dilemma” competition organized as the research object, and found that the winner in the two rounds is the simplest strategy “one Retaliation for Retaliation”, that is, always start with cooperation, cooperate in the first step, but since then it has adopted the strategy of “returning to others with their own way”, adopting the opponent’s choice in the previous step.

Keeping updated means that our project will always choose to start with cooperation. If the maintainer meets our expectations, it will continue to be updated; if the maintainer does not respond to our expectations, it will be replaced by constant unexpected breaks and low-level bugs. We will choose to confront, no longer rely on or switch to other upstream.

The story of OpenDAL and size is a great example. OpenDAL’s performance test relies on size to convert about size, but in recent updates, size has made several destructive changes in a row:

Keeping updates on the one hand ensures that our project can continue to be built on a stable basis, and on the other hand can continue to bring feedback to the community to help the community build the project better. Github has become an important contributor to Git in recent years. They continue to keep the Git version synchronized with the latest stable version, and feedback the problems encountered in their test/production environment to the community. On the one hand, they maintain their online stability, on the other hand Aspects also help the Git community reproduce and verify a large number of bugs. Coincidentally, JetBrains, for the consideration of developing PhpStorm , keeps following up with the latest PHP version, becoming an important development force in the PHP community, and even led the formation of The PHP Foundation . It is not surprising to think about the commercial cost of PhpStorm in reverse from this perspective ( 63% of PHP developers use PhpStorm for development ).

With the industry’s continued focus on open source software supply chain issues, it’s getting easier to stay updated. Both Github Dependabot and Renovate Bot can implement keeping dependencies up-to-date according to specified requirements. Command-line applications such as cargo-update can implement manual update of dependencies, and maintainers can specify a version update strategy according to the actual situation of their own projects.

communicate

Maintaining communication is an important part of upstream priority: there is no open source without communication. The biggest difference between open source projects and closed source projects is that the communication channel is open: we can directly feedback the bugs we encounter, the features we need, and so on to the upstream.

The disclosure of communication channels does not mean that we can abuse them: we need to fill in bug feedback according to the requirements of the community, provide public information as much as we can, and provide reproducible steps. antfu ‘s new blog post, please provide minimal reproduction, has an excellent discussion of this issue. I also observed another negative case of not actively reporting to the upstream after encountering a bug, but adopting a workaround after scolding the upstream in my heart. In fact, the upstream is completely unaware of the bug due to the lack of scenarios. exists and cannot be repaired.

The story of Databend and pprof-rs illustrates this well. prof-rs is a Rust CPU Profiler library incubated by the TiKV community. It is widely used in the Rust community, and Databend is no exception. But in a recent upgrade, pprof-rs broke Databend’s build under the x86_64-unknown-linux-musl target: @YangKeao, maintainer of pprof -rs, thinks pthread_getname_np is already included in the latest version of musl and libc also provides Corresponding support, the previous mechanism change of using fallback under musl target was withdrawn. However Databend has not updated the musl in the build environment to the latest version, so the build fails because the symbol cannot be found.

At first, the Databend community did not know why pprof-rs made such a change. In order to avoid similar breaks again, they were still considering whether Fork was required for independent maintenance. For the consideration of upstream priority, @PsiACE opened a new issue to the pprof-rs project to report the problems encountered in this build: pprof 0.9.1: Databend cannot be built on musl targets , and then workaround to ensure that Databend cannot be built. Affected and prominently linked to the Issue in the patch, and requested to switch upstream back when the issue is fixed.

In the process of Tracking this issue, we found that the root cause of this build failure was that our musl version was not upgraded to the latest version. So @everpcpc submitted PR chore: upgrade musl in build tool image to 1.2.3 to upgrade the musl dependencies in Databend’s build image to the latest version. In the follow-up further communication, we found more related problems.

If the Databend community does not maintain communication with the upstream pprof-rs, it may miss the opportunity to completely locate the problem and plant more hidden dangers.

contribute

Contributing to open source projects is the best way to protect your own interests. Open source projects are different from internal company projects, and their decisions are driven by the overall interests of the open source community. If you do not participate in open source contributions, you will lack influence on the development direction of the project, which will lead to the complete failure of your existing investment.

The meaning of contribution here is comprehensive: downstream projects can help upstream projects locate/debug problems, directly provide bug fixes, and even add corresponding test cases to CI to ensure that they are not accidentally damaged.

To prevent pprof-rs from accidentally introducing changes that cause Databend break, I submitted a PR feat: Fix and cover tests for target x86_64-unknown-linux-musl , fixed and enabled target x86_64-unknown-linux-musl , which is Databed Integrated build and test under use case. In this way, any PR of pprof-rs in the future will be built and tested under the usage scenario of Databend before passing.

In the async-compression project, I submitted a proposal: Export codec and Encode/Decode trait to the community, describing my own usage scenarios and putting forward my own requirements. In this way, OpenDAL can be added to the long-term evolution of async-compression, avoiding the need to rely on a temporary fork that is out of maintenance. Such demand feedback also pushes authors to think about the use-case of their own projects and how to better support such use-cases. Currently the author of async-compression has submitted a new PR Remove support for tokio 0.2 and 0.3 to remove obsolete features to support better extensions in the future.

Summarize

In the end, upstream first doesn’t mean an outright negation of Fork. The splitting and merging of the open source community reflects the comparison of the productivity of different interest trends. If you do find that there are unsolvable problems in the upstream, and you think you can do better, then you can completely replace it and become a new upstream.

  • Neovim is not satisfied with the existing design and development process of vim , and has formed a new open source community to start independent development around new goals. Influenced by Neovim, Vim has also started adding features that the community has been requesting.
  • After OpenWrt and LEDE were developed separately for more than two years, they decided to merge and develop together as OpenWrt .
  • jemallocator has lacked updates since 2019 and entered a suspended maintenance state. The TiKV community has created its own Fork tikv-jemallocator , which has injected new vitality into the community and has become the choice of many projects. Rust will also do it in the PR Use tikv-jemallocator in rustc/rustdoc in addition to jemalloc-sys when enabled switch.

These cases all illustrate that only Forks created by communities that can supply sufficient productivity can become new upstreams and are trusted by many projects; Forks created by communities with insufficient productivity supply will be naturally forgotten after a brief noise, and their benefits are far Rather than adopt an Upstream First strategy.

This article discusses the topic of why upstream priority is adopted and how to do it with examples. I hope more projects and contributors can join the ranks of upstream priority and deliver high-quality software together!

This article is reproduced from: https://xuanwo.io/reports/2022-25/
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment