Hands-on teaching you to implement a DSL language using ANTLR and Go (Part 3): Building and validating semantic models

Permalink to this article – https://ift.tt/NJpYgi0

In the previous series of articles, we designed a DSL called Tdat for meteorologists, wrote Tdat grammar using ANTLR grammar rules, generated Tdat parser code based on the grammar, and preliminarily verified the grammar Correctness , Tdat can successfully parse the Tdat grammar code sample we wrote into an in-memory tree structure.

At this point, the DSL syntax code we wrote doesn’t work as expected because of the lack of execution semantics . In this article, we will build a semantic model for this DSL and verify this semantic model separately.

Let our syntax examples actually work as expected!

1. What is a semantic model

From the previous articles, we learned that grammars only formalize the grammatical structure of DSLs, that is, how they are represented in the syntax tree, and all this has nothing to do with semantics . And the so-called semantics is what it will do when the code written in this syntax is executed !

The same grammar, even if the same grammar tree is generated, the semantics will be different due to the different interpretation methods of the grammar tree. Here’s an example from Martin Fowler’s book Domain-Specific Languages :

We see the code written for the same grammar: 5+3, if the semantic model is different, then the execution result will not be the same: if the syntax tree is interpreted according to the addition semantics, the code execution result we get is 8; if the grammar is interpreted according to the connection semantics tree, we get a code execution result of 53.

So what form does the semantic model take ? Generally speaking, a semantic model is also one or some specific data structures in memory. The purpose of this data structure is to express semantics and guide the execution logic of statements.

For example: In the csv2map example in the article “Implementing DSL with ANTLR and Go” , its semantic model is stored in a map structure (see the cm field below) and slice structure (see headers below) in the structure of CSVMapListener. now:

 // github.com/bigwhite/experiments/tree/master/antlr/csv2map/csv_listener.go type CSVMapListener struct { *parser.BaseCSVListener headers []string cm []map[string]string fields []string // a slice of fields in current row }

csv2map fills and constructs two fields, cm and headers, by traversing the generated syntax tree and extracting information. Subsequent code execution is based on the information stored in these two fields.

At this point, some children’s shoes may ask: Is it necessary to extract and assemble a semantic model separately for all DSLs ? At least Martin Fowler suggested doing this. The biggest advantage of doing this is to decouple the two phases of parsing and semantic execution , and then the semantic model can be tested and verified independently without relying on the parsing process.

I personally think that for a slightly larger non-trivial DSL, it is necessary to separate the semantic model, otherwise the coupling of semantic execution and syntax parsing will make the implementation of the DSL difficult to understand and maintain, as well as difficult to test and verify. .

For some simple DSLs, the syntax tree itself can be regarded as a semantic model. In this case, the traversal process of the syntax tree will be accompanied by the execution of statement semantics . The following is a typical syntax tree as semantics. An example of the execution model (adapted from the example in this article ), the example grammar is as follows:

 // Calc.g4 grammar Calc; // Rules start : expression EOF; expression : expression op=('*'|'/') expression # MulDiv | expression op=('+'|'-') expression # AddSub | NUMBER # Number ; // Tokens MUL: '*'; DIV: '/'; ADD: '+'; SUB: '-'; NUMBER: [0-9]+; WHITESPACE: [ \r\n\t]+ -> skip;

After generating the Parser code based on the grammar, we implement a Listener for the syntax tree:

 // calc/calc_listener_impl.go type calcListener struct { *parser.BaseCalcListener stack []int } ... ... func (l *calcListener) ExitMulDiv(c *parser.MulDivContext) { right, left := l.pop(), l.pop() switch c.GetOp().GetTokenType() { case parser.CalcParserMUL: l.push(left * right) case parser.CalcParserDIV: l.push(left / right) default: panic(fmt.Sprintf("unexpected op: %s", c.GetOp().GetText())) } } func (l *calcListener) ExitAddSub(c *parser.AddSubContext) { right, left := l.pop(), l.pop() switch c.GetOp().GetTokenType() { case parser.CalcParserADD: l.push(left + right) case parser.CalcParserSUB: l.push(left - right) default: panic(fmt.Sprintf("unexpected op: %s", c.GetOp().GetText())) } } func (l *calcListener) ExitNumber(c *parser.NumberContext) { i, err := strconv.Atoi(c.GetText()) if err != nil { panic(err.Error()) } l.push(i) }

This code directly treats the syntax tree created by Parser as a binary expression tree ( binary expression tree , leaf nodes are operands, other nodes are operators) , and then use the expression tree evaluation algorithm (by a stack) To implement the evaluation semantics of the code, see the main function code that drives the evaluation below:

 // calc/main.go // calc takes a string expression and returns the evaluated result. func calc(input string) int { // Setup the input is := antlr.NewInputStream(input) // Create the Lexer lexer := parser.NewCalcLexer(is) stream := antlr.NewCommonTokenStream(lexer, antlr.TokenDefaultChannel) // Create the Parser p := parser.NewCalcParser(stream) // Finally parse the expression (by walking the tree) var listener calcListener antlr.ParseTreeWalkerDefault.Walk(&listener, p.Start()) return listener.pop() } func main() { println(calc("1 + 2 * 3")) // 7 println(calc("12 * 3 / 6")) // 6 }

Through the above code, we can clearly see that this example directly uses the syntax tree established after source code parsing as the semantic model, which makes the semantic model tightly coupled with the structure of the parsed syntax tree. Once the syntax changes, the syntax Changes in the tree structure will directly affect the execution of the semantic model, and the implementation of the semantic model will also change accordingly.

For our own tdat DSL, we’ll take the semantic model separate from the syntax tree. Let’s take a look at the semantic model of tdat.

2. Expression tree of semantic model

In the first article of this series, we introduced the semantic features of the Tdat DSL, and our semantic model is to implement these semantic features. Let’s review the core production rule ruleLine in the tdat grammar:

 ruleLine : ruleID ':' enumerableFunc '{' windowsRange conditionExpr '}' '=>' result ';' ;

In this production rule, the main rules that affect semantic computation include: conditionExpr, windowRange, enumableFunc, and result, and the most complex rule is conditionExpr. This rule is essentially a mixed computation of a set of unary, arithmetic, comparison and logical expressions,

So, can we use the syntax tree directly as a semantic model like the calc example above? Implementation level is possible. Let’s take the following complex conditionExpr expression as an example:

 (($speed < 5) and (($temperature + 1) < 10)) or ((roundDown($speed) <= 10.0) and (roundUp($salinity) >= 500.0))

Let’s compare the difference between using the syntax tree directly as the semantic model and using the expression tree structure as the semantic model:

From the above figure, we can see that the syntax tree is built for parsing grammar, not for expression tree calculation. If we do semantic calculation directly based on the syntax tree, we need to traverse some unrelated symbol nodes (non-essential symbols). The node in the red circle), has extra overhead and affects performance; secondly, the conditionExpr used by tdat here is not a standard binary expression tree, we need to design our own expression evaluation algorithm; the last is the syntax parsing mentioned by Martin Fowler The downsides of being coupled with the semantic model are gone. When the semantic model remains unchanged, once the syntax structure is changed, not only the structure of the syntax tree is affected, but also the evaluation behavior of the semantic model.

Therefore, here we directly separate the semantic model from the syntax tree, and we use the binary expression tree at the bottom of the above figure as the main semantic model . This allows us to independently build, implement and test the semantic model.

A typical binary expression tree like the one below the figure above can be constructed by a Reverse Polish notation . For the construction algorithm, please refer to “Data Structure and Algorithm Analysis: C Language Description (Original Book 2nd Edition” of subsection 4.2.2.

Now I will briefly talk about the construction and evaluation of this expression tree.

Let’s first create a binary tree data structure:

 // tdat/semantic/semantic.go // semantic tree type Tree interface { GetParent() Tree SetParent(Tree) GetValue() Value SetLeftChild(Tree) Tree GetLeftChild() Tree SetRightChild(Tree) Tree GetRightChild() Tree } type Value interface { Type() string Value() interface{} } // Node is an implementation of Tree // and each node can be seen as a tree type Node struct { V Value l *Node // left node r *Node // right node p *Node // parent node }

We established an interface type of a binary tree and provided a structure type Node for implementing the interface type. Each Node is a node in the Tree, and it can also be regarded as a Tree itself. Each Node in the tree has a Value, and Value is also an interface type. It has four implementations:

  • BinaryOperator

Binary operators, including: binary arithmetic operators (+, -, *, /, %, etc.), relational operators (>, <, >=, <=, ==, etc.) and binary logical operators ( and and or).

  • UnaryOperator

Unary operators/built-in functions, including: roundUp, roundDown, abs, etc., are extensible.

  • Variable

Used to represent data indicators, such as speed, temperature, etc.

  • Literal

Literal values, such as: 10, 3.1415, “hello”, are usually used as rvalues, or combined with Varible to form expressions with binary arithmetic operators.

Both BinaryOperator and UnaryOperator are operators, while Variable and Literal are operands. Thus, an expression tree is a tree with operands as leaf nodes and operators as other nodes. Since the tree is at most binary operators, the expression tree is exactly a binary tree, and the operands of the unary operators are placed at the left child node by default.

As mentioned above, we can build such an expression tree based on the reverse Polish expression. The following is the implementation of building this tree based on the reverse Polish expression:

 // semantic/semantic.go // construct a tree based on a reversePolishExpr func NewFrom(reversePolishExpr []Value) Tree { var s Stack[Tree] for _, v := range reversePolishExpr { switch v.Type() { case "literal", "variable": s.Push(&Node{ V: v, }) case "binop": rchild, lchild := s.Pop(), s.Pop() n := &Node{ V: v, } n.SetLeftChild(lchild) n.SetRightChild(rchild) s.Push(n) case "unaryop": lchild := s.Pop() n := &Node{ V: v, } n.SetLeftChild(lchild) s.Push(n) } } first := s.Pop() root := &Node{} root.SetLeftChild(first) return root }

In this implementation, we cache subtree nodes via a stack. We read the operators or operands in the Reverse Polish expression one by one from left to right:

  • If the read value is an operand (literal or variable), the operand is packaged into a Node (understandable as a subtree) and pushed onto the stack;
  • If the value read out is a binary operator, two nodes will be popped from the stack as the left and right nodes of the binary operator node, and the combined subtree will be pushed onto the stack;
  • If the read Value is a unary operator, a node is popped from the stack as the left node of the unary operator node, and the merged subtree is pushed onto the stack.
  • The last thing stored in the stack is the top-level operator node of the tree. After the node is popped up as the left child node of the Root node, the construction of the expression tree is over. The distinctive feature of this Root node is that its parent is nil (used when traversing the tree).

What does the tree after construction look like? We can view it through the Dump function:

 func printPrefix(level int) { for i := 0; i < level; i++ { if i == level-1 { fmt.Printf(" |---") } else { fmt.Printf(" ") } } } func Dump(t Tree, order string) { var f = func(n *Node, level int) { if n == nil { return } printPrefix(level) if np == nil { // root node fmt.Printf("[root]()\n") } else { fmt.Printf("[%s](%v)\n", nVType(), nVValue()) } } switch order { default: // preorder preOrderTraverse(t.(*Node), 0, f, nil) case "inorder": inOrderTraverse(t.(*Node), 0, f, nil) case "postorder": postOrderTraverse(t.(*Node), 0, f, nil) } }

Dump is based on tree traversal, and provides the characteristics of outputting each Node of the tree in preorder (preOrder), inorder (inOrder) and postorder (postOrder) traversal modes. The traversal of the tree is the basic operation of the tree. Take preorder traversal as an example to see the implementation of traversal:

 // pre order traverse func preOrderTraverse(t *Node, level int, enterF func(*Node, int), exitF func(*Node, int)) { if t == nil { return } if enterF != nil { enterF(t, level) // traverse this node } // traverse left children preOrderTraverse(tl, level+1, enterF, exitF) // traverse right children preOrderTraverse(tr, level+1, enterF, exitF) if exitF != nil { exitF(t, level) // traverse this node again } }

The “idea” of the ANTLR grammar parse tree is borrowed here, and the callbacks of enterF and exitF are provided when traversing each Node, which is used by the user to customize the behavior when traversing the Node. After understanding the principle, let’s look at the reverse Polish expression based on the following:

 speed,50,<,temperature,1,+,4,<,and,salinity,roundDown,600,<=,ph,roundUp,8.0,>,or,or

The constructed Tree looks like this:

 [root]() |---[binop](or) |---[binop](and) |---[binop](<) |---[variable](speed) |---[literal](50) |---[binop](<) |---[binop](+) |---[variable](temperature) |---[literal](1) |---[literal](4) |---[binop](or) |---[binop](<=) |---[unaryop](roundDown) |---[variable](salinity) |---[literal](600) |---[binop](>) |---[unaryop](roundUp) |---[variable](ph) |---[literal](8)

Once the tree is constructed, we can evaluate based on the tree. The following is the implementation of the evaluation function Evaluate:

 func Evaluate(t Tree, m map[string]interface{}) (result bool, err error) { var s Stack[Value] defer func() { // extract error from panic if x := recover(); x != nil { result, err = false, fmt.Errorf("eval error: %v", x) return } }() var exitF = func(n *Node, level int) { if n == nil { return } if np == nil { // root node return } v := n.GetValue() switch v.Type() { case "binop": rhs, lhs := s.Pop(), s.Pop() s.Push(evalBinaryOpExpr(v.Value().(string), lhs, rhs)) case "unaryop": lhs := s.Pop() s.Push(evalUnaryOpExpr(v.Value().(string), lhs)) case "literal": s.Push(v) case "variable": name := v.Value().(string) value, ok := m[name] if !ok { panic(fmt.Sprintf("not found variable: %s", name)) } // use the value in map to replace variable s.Push(&Literal{ Val: value, }) } } preOrderTraverse(t.(*Node), 0, nil, exitF) result = s.Pop().Value().(bool) return }

Although the preOrderTraverse is used here, we are doing the calculation in the exitF callback, so this is equivalent to a standard post-order traversal of the tree. Whenever an operand is encountered, it is pushed onto the stack; when the operand is a variable, it is searched in the input parameter map to see if the variable exists, and if it exists, the value is pushed onto the stack. Whenever an operator is encountered, the operand is popped off the stack for calculation, and then pushed onto the stack. In this way, only one value is stored in the final stack, which is the calculation result of this expression tree.

3. Verification of the expression tree of the semantic model

As mentioned earlier, after the semantic model is separated from the syntax tree, we can test the semantic model separately. The following is a simple table-driven unit test for the expression tree :

 // tdat/semantic/semantic_test.go func TestNewFrom(t *testing.T) { //($speed < 50) and (($temperature + 1) < 4) or ((roundDown($salinity) <= 600.0) or (roundUp($ph) > 8.0)) // speed,50,<,temperature,1,+,4,<,and,salinity,roundDown,600,<=,ph,roundUp,8.0,>,or,or var reversePolishExpr []Value reversePolishExpr = append(reversePolishExpr, newVariable("speed")) reversePolishExpr = append(reversePolishExpr, newLiteral(50)) reversePolishExpr = append(reversePolishExpr, newBinaryOperator("<")) reversePolishExpr = append(reversePolishExpr, newVariable("temperature")) reversePolishExpr = append(reversePolishExpr, newLiteral(1)) reversePolishExpr = append(reversePolishExpr, newBinaryOperator("+")) reversePolishExpr = append(reversePolishExpr, newLiteral(4)) reversePolishExpr = append(reversePolishExpr, newBinaryOperator("<")) reversePolishExpr = append(reversePolishExpr, newBinaryOperator("and")) reversePolishExpr = append(reversePolishExpr, newVariable("salinity")) reversePolishExpr = append(reversePolishExpr, newUnaryOperator("roundDown")) reversePolishExpr = append(reversePolishExpr, newLiteral(600.0)) reversePolishExpr = append(reversePolishExpr, newBinaryOperator("<=")) reversePolishExpr = append(reversePolishExpr, newVariable("ph")) reversePolishExpr = append(reversePolishExpr, newUnaryOperator("roundUp")) reversePolishExpr = append(reversePolishExpr, newLiteral(8.0)) reversePolishExpr = append(reversePolishExpr, newBinaryOperator(">")) reversePolishExpr = append(reversePolishExpr, newBinaryOperator("or")) reversePolishExpr = append(reversePolishExpr, newBinaryOperator("or")) tree := NewFrom(reversePolishExpr) Dump(tree, "preorder") // test table var cases = []struct { id string m map[string]interface{} expected bool }{ //($speed < 50) and (($temperature + 1) < 4) or ((roundDown($salinity) <= 600.0) or (roundUp($ph) > 8.0)) { id: "0001", m: map[string]interface{}{ "speed": 30, "temperature": 6, "salinity": 700.0, "ph": 7.0, }, expected: false, }, { id: "0002", m: map[string]interface{}{ "speed": 30, "temperature": 1, "salinity": 500.0, "ph": 7.0, }, expected: true, }, { id: "0003", m: map[string]interface{}{ "speed": 60, "temperature": 10, "salinity": 700.0, "ph": 9.0, }, expected: true, }, { id: "0004", m: map[string]interface{}{ "speed": 30, "temperature": 1, "salinity": 700.0, "ph": 9.0, }, expected: true, }, } for _, caze := range cases { r, err := Evaluate(tree, caze.m) if err != nil { t.Errorf("[case %s]: want nil, actual %s", caze.id, err.Error()) } if r != caze.expected { t.Errorf("[case %s]: want %v, actual %v", caze.id, caze.expected, r) } } }

The above is the most complex part of the semantic model, but not all, as well as windowRange, enumableFunc and result, let’s build a complete semantic model of tdat.

4. Establish a complete semantic model

Earlier we have solved the most complex part of the semantic model: the conditionExpr. Let’s implement the complete semantic model. We define a Model structure to represent the semantic model:

 // tdat/semantic/semantic.go type WindowsRange struct { low int high int } type Model struct { // conditionExpr t Tree // windowsRange wr WindowsRange // enumerableFunc ef string // result result []string }

We see that Model is essentially an aggregation of elements that affect the execution result, such as conditionExpr, WindowsRange, enumerableFunc, and result, so the creation function of Model is relatively simple:

 func NewModel(reversePolishExpr []Value, wr WindowsRange, ef string, result []string) *Model { m := &Model{ t: NewFrom(reversePolishExpr), wr: wr, ef: ef, result: result, } return m }

Let’s focus on the semantic execution method Exec of the Model:

 // tdat/semantic/semantic.go func (m *Model) Exec(metrics []map[string]interface{}) (map[string]interface{}, error) { var res []bool for i := m.wr.low - 1; i <= m.wr.high-1; i++ { r, err := Evaluate(mt, metrics[i]) if err != nil { return nil, err } res = append(res, r) } andRes := res[0] orRes := res[0] for i := 1; i < len(res); i++ { andRes = andRes && res[i] orRes = orRes || res[i] } switch m.ef { case "any": if orRes { return m.outputResult(metrics[0]) } return nil, ErrNotMeetAny case "none": if andRes == false { return m.outputResult(metrics[0]) } return nil, ErrNotMeetNone case "each": if andRes == true { return m.outputResult(metrics[0]) } return nil, ErrNotMeetEach default: return nil, ErrNotSupportFunc } }

The implementation here is not “performant”, but the logic is clear: Exec will use the expression tree to evaluate each element in the iteration window (low to high), put the result of the evaluation into a slice, and then Slice, find the logical and (andRes) and logical or (orRes) of all elements, and then combine the type of enumerableFunc to comprehensively determine whether to output the latest metric.

The verification of Model is similar to that of expression tree. Due to space limitations, I will not go into details here. You can refer to the test case demo in semantic_test.go.

V. Summary

In this part of the content, we have established a semantic model for DSL. The core of the tdat semantic model is the expression tree, so we focus on the method of creating an expression tree based on the reverse Polish style, the evaluation method of the expression tree, and the expression Validation of the tree. Finally, we build a full model called semantic.Model.

In the next article, we will explain how to extract the reverse Polish style based on the syntax tree of the DSL, assemble the semantic model, and connect the front and back ends of the DSL, so that our grammar examples can actually run.

The code covered in this article can be downloaded here – https://ift.tt/THQv69f.


“Gopher Tribe” Knowledge Planet aims to create a high-quality Go learning and advanced community! High-quality first published Go technical articles, “three-day” first published reading rights, analysis of the current situation of Go language development twice a year, read the fresh Gopher daily 1 hour in advance every day, online courses, technical columns, book content preview, must answer within six hours Guaranteed to meet all your needs about the Go language ecosystem! In 2022, the Gopher tribe will be fully revised, and will continue to share knowledge, skills and practices in the Go language and Go application fields, and add many forms of interaction. Everyone is welcome to join!

img{512x368}

img{512x368}

img{512x368}

I love texting : Enterprise-level SMS platform customization development expert https://51smspush.com/. smspush : A customized SMS platform that can be deployed within the enterprise, with three-network coverage, not afraid of large concurrent access, and can be customized and expanded; the content of the SMS is determined by you, no longer bound, with rich interfaces, long SMS support, and optional signature. On April 8, 2020, China’s three major telecom operators jointly released the “5G Message White Paper”, and the 51 SMS platform will also be newly upgraded to the “51 Commercial Message Platform” to fully support 5G RCS messages.

The famous cloud hosting service provider DigitalOcean released the latest hosting plan. The entry-level Droplet configuration is upgraded to: 1 core CPU, 1G memory, 25G high-speed SSD, and the price is 5$/month. Friends who need to use DigitalOcean can open this link : https://ift.tt/TdNOkA5 to open your DO host road.

Gopher Daily Archive Repository – https://ift.tt/4LuYJGd

my contact information:

  • Weibo: https://ift.tt/fCtsJG0
  • WeChat public account: iamtonybai
  • Blog: tonybai.com
  • github: https://ift.tt/2BJyOw7
  • “Gopher Tribe” Planet of Knowledge: https://ift.tt/oxHXQe2

Business cooperation methods: writing, publishing books, training, online courses, partnership entrepreneurship, consulting, advertising cooperation.

© 2022, bigwhite . All rights reserved.

This article is reprinted from https://tonybai.com/2022/05/27/an-example-of-implement-dsl-using-antlr-and-go-part3/
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment