How to write a special format configuration file parser

Last week, I read this article – How to Write a Lexer in Go , I found that it is not so difficult to design a configuration file parser by this article’s mind-set. Then I try to write a fluent-bit configuration parser and got this Fluent-Bit configuration parser for Golang .

In this article, I want to introduce how to parse Fluent-bit configuration .conf file, and the thinking behind it is suitable for any other format file.

Fluent-bit configuration format and schema

 [FIRST_SECTION] Key1 some value Key2 another value [SECOND_SECTION] KeyN 3.14

Here is a classic mode configuration of Fluent-bit, it includes two key parts:

  • Section
  • Key/value pair

First of all, we need to define a struct which represents the Fluent-bit configuration file.

 type FluentBitConf struct { Sections [] Section } type Section struct { Name string Entries [] Entry } type Entry struct { Key string Value interface {} }

Once we have a struct, the next step is to parse tokens from file and save their values ​​into golang struct. We can copy the logic of lexer to develop our own fluentbit parser.

In a lexer program, the target charectors which we want to parse out are called “Token”, Token is also the keyword which our parser program are searching for. A parser program will read charactors in a file one by one, whenever it found a token, parser save the value between tokens into the final structure and go ahead.

Parse a single token

If we want to parse Section, we have to make parser read charactors one by one and stop at [ charator, which means the beginning of a Section. Parser must save current state as t_section and keep parser reading until ] charactor, the word between [ and ] is the Section value we need to persist into go struct.

 // define some tag to tell parser state const ( t_section = iota ) func ( parser * FluentBitConfParser ) Parse () * FluentBitConf { var currSection * Section = nil for { // read charector one by one r , _ , err := parser . reader . ReadRune () if err != nil { // stop at the end of file if err == io . EOF { if currSection != nil { parser . Conf . Sections = append ( parser . Conf . Sections , * currSection ) } return parser . Conf } return parser . Conf } switch r { case '\n' : continue case '[' : // save last config item if currSection != nil { parser . Conf . Sections = append ( parser . Conf . Sections , * currSection ) } // create new config item currSection = & Section { Name : "" , Entries : [] Entry {}, } parser . token = t_section default : if unicode . IsSpace ( r ) { continue } // here is important function, read the charectors after token-chareactor and save them into struct strValue , _ := parser . parseString () switch parser . token { case t_section : currSection . Name = strValue parser . token = t_entry_key } } }

In function parser.parseString() , we have to read unitl the end of a value (for section, it’s ] ), then return the value.

 func ( parser * FluentBitConfParser ) parseString () ( string , error ) { var val string = "" if err := parser . reader . UnreadRune (); err != nil { return "" , err } for { r , _ , err := parser . reader . ReadRune () if err != nil { if err == io . EOF { return val , nil } return "" , err } if parser . token == t_section && r == ']' { return val , nil } val = val + string ( r ) } }

That’s all logic for parsing a section. To parse key/value pair is the same process, just note to make parser know which state it is and save values ​​between whitespace or \n , you can see the code at the github repo .

Conclusion

To parse a configuration file, we have to

  • Defining token (key charectors)
  • Reading charectors and looking for token
  • Saving current state to tell parser which struct the following charectors belong

This article is reprinted from: https://sund.site/posts/2022-5-8_lexer_design/
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment