parser/vyos1x_lexer.mll


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136

{

open Vyos1x_parser

exception Error of string

(*

The language of the VyOS 1.x config file has multiple ambiguities that
make it not even context free.

The main issue is lack of explicit statement separators, so if we ignore whitespace,
a parser is left to guess if, for example

address dhcp # leaf node with a value
disable      # valueless leaf node

is three valueless nodes, a valueless node followed by a node with a value,
or a node with a value followed by a valueless node.

The only cue is the newline, which means that newlines are sometimes significant,
and sometimes they aren't.

interfaces { # doesn't matter
  ethernet 'eth0' { # doesn't matter
    address '192.0.2.1/24' # significant!
    disable # significant!

    # empty line -- doesn't matter
    hw-id 00:aa:bb:cc:dd:ee # significant!
  } # doesn't matter
}

If there were explicit terminators (like we do in VyConf, or like JunOS does),
the language would be context free. Enter the lexer hack: let's emit newlines only
when they are significant, so that the parser can use them as terminators.

The informal idea is that a newline is only significant if it follows a leaf node.
So we need rules for finding out if we are inside a leaf node or not.

These are the formal rules. A newline is significant if and only if
1. Preceding token is an identifier
2. Preceding token is a quoted string

We set the vy_inside_node flag to true when we enter a leaf node and reset it when
we reach the end of it.

*)

let vy_inside_node = ref false

}

rule token = parse
| [' ' '\t' '\r']
    { token lexbuf }
| '\n'
    { Lexing.new_line lexbuf; if !vy_inside_node then (vy_inside_node := false; NEWLINE) else token lexbuf }
| '"'
    { vy_inside_node := true; read_string (Buffer.create 16) lexbuf }
| '''
    { vy_inside_node := true; read_single_quoted_string (Buffer.create 16) lexbuf }
| "/*"
    { vy_inside_node := false; read_comment (Buffer.create 16) lexbuf }
| '{'
    { vy_inside_node := false; LEFT_BRACE }
| '}'
    { vy_inside_node := false; RIGHT_BRACE }
| [^ ' ' '\t' '\n' '\r' '{' '}' '[' ']' ';' '#' '"' ''' ]+ as s
    { vy_inside_node := true; IDENTIFIER s}
| eof
    { EOF }
| _
{ raise (Error (Printf.sprintf "At offset %d: unexpected character.\n" (Lexing.lexeme_start lexbuf))) }

and read_string buf =
  parse
  | '"'       { STRING (Buffer.contents buf) }
  | '\\' '/'  { Buffer.add_char buf '/'; read_string buf lexbuf }
  | '\\' '\\' { Buffer.add_char buf '\\'; read_string buf lexbuf }
  | '\\' 'b'  { Buffer.add_char buf '\b'; read_string buf lexbuf }
  | '\\' 'f'  { Buffer.add_char buf '\012'; read_string buf lexbuf }
  | '\\' 'n'  { Buffer.add_char buf '\n'; read_string buf lexbuf }
  | '\\' 'r'  { Buffer.add_char buf '\r'; read_string buf lexbuf }
  | '\\' 't'  { Buffer.add_char buf '\t'; read_string buf lexbuf }
  | '\\' '\'' { Buffer.add_char buf '\''; read_string buf lexbuf }
  | '\\' '"' { Buffer.add_char buf '"'; read_string buf lexbuf }
  | '\n'      { Lexing.new_line lexbuf; Buffer.add_char buf '\n'; read_string buf lexbuf }
  | [^ '"' '\\']+
    { Buffer.add_string buf (Lexing.lexeme lexbuf);
      read_string buf lexbuf
    }
  | _ { raise (Error (Printf.sprintf "Illegal string character: %s" (Lexing.lexeme lexbuf))) }
  | eof { raise (Error ("String is not terminated")) }

and read_single_quoted_string buf =
  parse
  | '''       { STRING (Buffer.contents buf) }
  | '\\' '/'  { Buffer.add_char buf '/'; read_string buf lexbuf }
  | '\\' '\\' { Buffer.add_char buf '\\'; read_string buf lexbuf }
  | '\\' 'b'  { Buffer.add_char buf '\b'; read_string buf lexbuf }
  | '\\' 'f'  { Buffer.add_char buf '\012'; read_string buf lexbuf }
  | '\\' 'n'  { Buffer.add_char buf '\n'; read_string buf lexbuf }
  | '\\' 'r'  { Buffer.add_char buf '\r'; read_string buf lexbuf }
  | '\\' 't'  { Buffer.add_char buf '\t'; read_string buf lexbuf }
  | '\\' '\'' { Buffer.add_char buf '\''; read_string buf lexbuf }
  | '\\' '"' { Buffer.add_char buf '"'; read_string buf lexbuf }
  | '\n'      { Lexing.new_line lexbuf; Buffer.add_char buf '\n'; read_string buf lexbuf }
  | [^ ''' '\\']+
    { Buffer.add_string buf (Lexing.lexeme lexbuf);
      read_single_quoted_string buf lexbuf
    }
  | _ { raise (Error (Printf.sprintf "Illegal string character: %s" (Lexing.lexeme lexbuf))) }
  | eof { raise (Error ("String is not terminated")) }

and read_comment buf =
  parse
  | "*/"
      { COMMENT (Buffer.contents buf) }
  | _
      { Buffer.add_string buf (Lexing.lexeme lexbuf);
        read_comment buf lexbuf
      }

(*

If you are curious how the original parsers handled the issue: they did not.
The CStore parser cheated by reading data from command definitions to resolve
the ambiguities, which made it impossible to use in standalone config
manipulation programs like migration scripts.

The XorpConfigParser could not tell tag nodes' name and tag from
a leaf node with a value, which made it impossible to manipulate
tag nodes or change values properly.

*)