2022-18: How to contribute to complex projects

Iteration 12 starts on 4/23 and ends on 5/6 for two weeks. This cycle has successfully stepped out of my comfort zone and explored a lot of things I have never understood before, such as tree-sitter, parser, hdfs, java, etc. I feel that I have gained a lot. Recently, I have read Contributing to Complex Projects written by @mitchellh many times. Today, this weekly report introduces how to contribute to complex projects based on my experience in contributing to difftastic from scratch.

foreword

The Contributing to Complex Projects article breaks down the contribution of complex projects into the following steps:

Step 1: Become a User
Step 2: Build the Project, build the project
Step 3: Learn the Hot-Path Internals, learn the key logic inside
Step 4: Read and Reimplement Recent Commits, read and reimplement recent commits
Step 5: Make a Bite-sized Change

These steps will work for the vast majority of projects, but will require some adjustments based on personal preference and circumstances. For example, I am more inclined to do things that can be merged into the main trunk, so I tend to skip Step 4 here and directly try to implement some relatively easy features in the process of contributing. In the process of participating in the contribution, everyone also needs to adjust their strategies according to the situation, and do not copy the steps here dogmatically.

About difftastic

difftastic is a semantic-aware diff tool written in Rust.

It can understand whether the character we modify in the code is the Item of the array or the parameter of the function, taking Javascripts as an example:

Highlighted { , } but foo(); here is not changed, although indent changed because it understands nesting
Aligns bar() on the left with bar(1) on the right because it knows they are the same function call.
Here eric is moved to the next line, but not highlighted because it knows it’s a newline that doesn’t change the semantics.

Behind the scenes, difftastic uses a tree-sitter to parse and compare the ASTs of files, rather than pure character-based diffs.

Currently difftastic supports diff in more than 30 programming languages and configuration files, and can run on mainstream platforms such as Linux, MacOS, and Windows.

become a user

One of the very interesting things about open source projects is that the developers are often the users themselves.

The User centrality principle in the Arch Way states:

Many Linux distributions try to be more “user friendly”, Arch Linux has always been and always will be “user centric”. This release is designed to meet the needs of contributors, not to attract as many users as possible. Arch is for do-it-yourself users who are willing to take the time to read the documentation and solve their own problems.

I think this is the most important step in participating in open source. Become a user, use it, understand what it does, and proactively discover deficiencies and improvement points, instead of gnawing on long design documents as soon as you come up. Many students’ enthusiasm for open source is often lost in the long process of reading documents: we don’t need to be experts in this area to contribute to the project. Before contributing difftastic, I knew nothing about parser, tree-sitter. Even now, my knowledge of them is limited to what they do, not how they actually work, let alone reading their code. But this does not prevent me from contributing to difftastic with multi-language support and fixing several bugs that cause Crash.

Most open source projects provide documentation for installation and usage, and difftastic is no exception. I successfully installed and configured difftastic with reference to Installation and Usage . I quickly discovered difftastic lacked perl and hcl support and decided to add it.

Build the project

The second step in participating in an open source project is to deploy the development environment and have a successful build. difftastic is a pure Rust project with few dependencies. It can be compiled using cargo build .

Complex projects often have complex dependencies, some of which are necessary dependencies of the project, including language build tools, dependency management, compile-time tools, etc., and some are tools that need to be used in the project development process, such as static inspection, code Formatting, integration testing, and more. Projects with better maintenance quality often provide Contributing or Get Started documents to inform us how to build the project. For example, TiDB details the dependencies and steps required to compile TiDB in TiDB Development Guide: Get Started . Further, some projects will provide a one-click script (though I don’t like this) to deal with dependencies, such as Databend provides dev_setup.sh . Projects with highly complex development environment configurations like Rust also develop additional tools to automate these steps.

What we need to do as developers is to read the README in its entirety and look for similar information. If you don’t find it, you can try the conventional method. For example, there are iconic files such as Makefile , package.json , Cargo.toml , and go.mod under the project. We can directly try to use the corresponding commands. After a successful build, we can try to submit a document for modifying the README and adding the build steps for the project for the convenience of future students.

learning critical path

The third step in participating in an open source project is learning the critical path. @mitchellh summed up his approach as: trace down, learn up. :

When learning a feature:

First search for codepaths related to this feature from top to bottom and ignore details that are not related to it
Then learn from the bottom up how this subsystem works
Try to modify the code, add new logs, add simple logic, modify some details to understand why it doesn’t work
Read the documentation or share about this feature

It is not difficult to find that this is often the case in our daily study and work, but we need to apply it systematically to open source projects. The distribution of code in a project also tends to follow the 80/20 rule: 20% of the code implements 80% of the functionality, so there is no need to try to understand the details of every line of code in the project. The best way is to read the code with questions and only look for the logic related to the functionality you implement. Well-maintained open source projects tend to add detailed documentation to the core logic and modules to solve common doubts, and many times reading the documentation can solve our problems.

That’s the case with difftastic, the author does an excellent job of providing documentation on Adding A Parser . With the help of the documentation, I just had to follow the steps in sequence and solve a simple compilation problem. Of course, more often we will face the lack and insufficiency of the document. At this time, we will contribute this document to the project after understanding this module. Even if the understanding is wrong, it does not matter. When submitting PR, we can discuss and confirm with the author. On the one hand, it can help contributors with the same problem, and on the other hand, it can deepen our understanding and understanding.

Start with small changes

The fourth step in participating in open source is to start with small changes. Contributing documentation is a good place to start, helping us understand how the project is being discussed and developed.

Please try to avoid taking on particularly complicated tasks as soon as you come up. On the one hand, we need to accumulate our reputation in the community through contributions. On the other hand, the failure of complex tasks will greatly damage our confidence. The recommended approach is to start with smaller functions, preferably limiting the impact to the current subsystem. As we implement functionality, we can start with the current module and understand how more modules work together. As we learn more modules, we can find more points for improvement and participate in the continuous evolution of the project.

In fix: Remove trailing lines before calculating max_lines , I found the root of the problem by adding some simple printlns in the wrong calculation of LineNumber, so I gave a fix:

 - (max(1, self.as_ref().split('\n').count()) - 1).into() + (max(1, self.as_ref().trim_end().split('\n').count()) - 1).into()

But soon I found this line of code a little hard to understand, so I did a simple refactoring:

 - (max(1, self.as_ref().trim_end().split('\n').count()) - 1).into() + self.as_ref() + .trim_end() // Remove extra trailing whitespaces. + .split('\n') // Split by `\n` to calculate lines. + .count() + .sub(1) // Sub 1 to make zero-indexed LineNumber + .into()

None of the above changes have been modified to other modules, so I only need to add an independent unit test, and the author can quickly verify whether my ideas are correct, avoiding back and forth discussions in PR.

Summarize

Contributing to complex open source projects is not difficult. With the right methodology, we can all join the ranks of open source:

become a user
Build the project
learning critical path
Start with small changes

Welcome to the open source community~

This article is reproduced from: https://xuanwo.io/reports/2022-18/
This site is for inclusion only, and the copyright belongs to the original author.