Text highlight editor
※This is what extracted the document of Editor.net.
Creating a New Parser
Now you should see the SyntaxBuilder window appear. If you wanted to use some existing scheme, you would have pressed the Load button, however, this time we are going to create a new scheme completely from scratch. After completing it, you can press the Save button to make it possible to use this new scheme in other projects.
The next thing to do, after pleasing your vanity and making your lawyers happy by entering the information about the author and the copyright is to define syntax highlighting styles used in the scheme. This is accomplished by clicking the right mouse button on the Styles node to bring the context menu, and choosing the Add Style command.
After creating a style you should give it a name and define its visual attributes. For this example we will need four three styles: number, punctuation, whitespace, and error. Let us define numbers to have olive color and italic text style, punctuation symbols to be blue, and errors to have red background and white foreground. The whitespace style is defined as having no distinct markup at all.
Then we define the states of the parser. For our example language there will be two states: default and block. States are defined similarly to styles, by choosing the Add State command in the context menu appearing to the States node. In turn, states contain syntax blocks, created by the Add Syntax Block command from the context menu of a state.
The syntax parser is essentially a state machine, driven by the text. Transition conditions are expressed in terms of regular expressions which are checked against the parsed text at the current position up to the next end of line. Expressions are tried in the syntax block definition order. The first successful match determines the syntax block. The text position is advanced by the length of the match, and the text is assigned the style specified for that syntax block. The matched text is additionally matched against the list of the reserved words associated with this syntax block, and
if match occurs, the style defined by the ResWord Style is used instead of the one defined by the Style property. The state of the state machine is changed according to the Leave State property of the syntax block, which can specify any of the states, including the same state, in which the syntax block resides, meaning no state transition is to take place.
The state machine for the language we are parsing is described in the following table, and deserves some comments.
The whitespace syntax block is only necessary because of the presence of a match all error syntax block. In the more common case where no error highlighting is used, no style (which is the same as the whitespace style that we have defined) would be used for the text that does not match any of the syntax blocks. The error syntax block is the last in the sequence and matches a single character which has not been matched by any of the preceding rules. The block syntax block is matched when the opening curly bracket is met. The bracket itself is assigned the punctuation style, and the state machine changes its state into the block state (note that state name, style name and syntax block style name coincidences are not required).
In the block state, the whitespace, and error syntax blocks serve the same purpose as in the default state. Number and comma syntax blocks cause numbers and commas to have the corresponding styles, and the end syntax block, which matches the closing curly bracket, causes the transition back to the default state.
State | Syntax Block | Regular Expression | Style | Leave |
Default | ||||
whitespace | \s+ | whitespace | default | |
block | \{ | punctuation | block | |
error | . | error | default | |
Block | ||||
whitespace | \s+ | Whitespace | block | |
number | \d+ | Number | block | |
comma | , | Punctuation | block | |
End | \} | Punctuation | default | |
Error | . | Error | block |
It would be possible to further elaborate the state machine to check that numbers are actually separated by commas, however in the real-life applications what we already have is good enough, so this possible improvement is left as an exercise for the inquisitive reader.
After this brief introduction to syntax parsers, you should have no problems with examining the existing parsers in search for solutions for your own parser.