2022-20: Iteration 13 Debriefing

Original link: https://xuanwo.io/reports/2022-20/

Iteration 13 starts on 5/7 and ends on 5/20 for two weeks. This cycle has successfully merged a lot of PRs submitted before, which feels very fulfilling.

difftastic

difftastic is a semantic-aware diff tool written in Rust.

During this cycle, I merged two PRs:

fix: Bad padding of column numbers at the end of files

The str2 here should need to be dangling aligned with the Violets above, but it doesn’t. After a simple debugging, it was found that some cases were mistakenly ignored when calculating the padding. After removing it, it was fixed:

feat: Improve binary content guess is an improved binary content detection algorithm.

In the past, difftastic used a simpler and cruder way to detect binaries:

 let num_replaced = String ::from_utf8_lossy(bytes)  .to_string()  .chars()  .take( 1000 )  .filter( | c | * c == std::char::REPLACEMENT_CHARACTER || * c == '\0' )  .count();  num_replaced > 20 

Look for illegal Unicode characters and \0 in the first thousand bytes, if there are more than 20 it is considered a binary file. In Issue Treat PDFs as binary files , the author wants to be able to detect PDFs and treat them as binary files. Given a file header, we are required to judge whether it is a binary file or not. In fact, it is easy to think of judging according to the magic number. I introduced tree_magic_mini in the PR to detect the MIME of this content, and unified processing for common binary types:

 let mime = tree_magic_mini::from_u8(bytes); match mime {  // Treat pdf as binary. "application/pdf" => return true ,  // Treat all image content as binary. v if v.starts_with( "image/" ) => return true ,  // Treat all audio content as binary. v if v.starts_with( "audio/" ) => return true ,  // Treat all video content as binary. v if v.starts_with( "video/" ) => return true ,  // Treat all font content as binary. v if v.starts_with( "font/" ) => return true ,  _ => {} } 

This allows difftastic to correctly handle most binary types.

databend

As mentioned in the last weekly report, this cycle I spent a lot of effort to improve the compatibility of the Databend configuration. Inspired by RFC: Config Backward Compatibility , I added a config compatibility layer for databend-query and databend-meta . At the same time, in the PR refactor: Reuse StorageConfig in stage , a large-scale reconstruction of the internal Storage-related configuration is also done, so that the internal logic can share the same configuration file, and there is no need to repeat the implementation of similar logic.

The configuration compatibility of databend-meta is relatively bumpy, because there are many internal configuration items, and it has undergone a lot of refactoring, so there are many configuration items in meta, many of which do not follow a unified naming style. So after struggling for a long time, I found a new usage of serfig :

 pub fn load () -> MetaResult < Self > {  let arg_conf = Self::parse();   let mut builder: serfig ::Builder < Self > = serfig::Builder::default();   // Load from the config file first. {  let config_file = if ! arg_conf.config_file.is_empty() {  arg_conf.config_file.clone()  } else if let Ok (path) = env::var( "METASRV_CONFIG_FILE" ) {  path  } else {  "" .to_string()  };   builder = builder.collect(from_file(Toml, & config_file));  }   // Then, load from env. let cfg_via_env: ConfigViaEnv = serfig::Builder::default()  .collect(from_env())  .build()  .map_err( | e | MetaError::InvalidConfig(e.to_string())) ? ;  builder = builder.collect(from_self(cfg_via_env.into()));   // Finally, load from args. builder = builder.collect(from_self(arg_conf));   builder  .build()  .map_err( | e | MetaError::InvalidConfig(e.to_string())) } 

Users can maintain a standalone cfg env wrapper that loads data from env and converts to cfg. However, this usage is cumbersome for a small number of env incompatibilities. A similar need was also mentioned by @DCjanus in Issue Add bind attribute support .

Summarize

After the next cycle, we will focus on Databend’s support for reading compressed files. Later, databend will be able to directly read compressed files such as gzip and zstd~

This article is reproduced from: https://xuanwo.io/reports/2022-20/
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment