Build Your Own Language, Part 4

We left off with a parser test we wanted to execute (in parser_spec.rb) and a grammar file to generate a parser (in grammar.y). The actions specified in the grammar file, and thus what will be tested in the spec file, are nodes. So getting those nodes in place will be the focus of this post.

So now let’s create an ast.rb file in the project directory. We’ll do the simplest possible thing at this point, which is:

Okay … so what’s going on there? Struct is an interesting concept in Ruby. When you create a new instance of Struct, it does not actually return an instance of Struct. Instead what gets returned is a class which is a subclass of the Struct class. That new class object can then be used to create specific instances of that new structure. Any arguments made to the Struct.new call are taken to be names of fields that the generated class will have. These names must be provided as symbols.

So that means this:

is really the same as this:

I’ll get more into how this working as we go on. But, for now, in order to be useful, this ast file will be required by the parser. So add that to your header section in the grammar.y file:

So that handles the nodes part. However, according to my test in parse_spec.rb, I still need a parse method in the parser. To put this in place, I’ll have to add another section to the grammar file. This will be an inner section. Anything in these sections will be placed inside of the generated parser class. So here’s what to add:

Here the parse method simply makes sure that the tokens have been derived from the lexer. The do_parse method is part of Racc and does pretty much what it sounds like: it kicks off the parsing process by initiating the generated parser. The next_token method is a method you must define because it will be called by do_parse. The next_token method does just what it sounds like: gets the next token in the list so that it can be parsed.

As with all changes to the grammar file, make sure you regenerate the parser.rb with the following command:

racc grammar.y -o parser.rb

Now let’s run that parser test:

rspec spec\parser_test.rb

Doing so, you’ll get output like this:

  1) Testing the Parser should parse a class definition
     Failure/Error: nodes.should == Parser.new.parse(class_test)
     Racc::ParseError:

       parse error on value "\n" (NEWLINE)

I did this just to show you how picky things can be. Here the parser is saying that it got a token (NEWLINE) that it has no idea what to do with. This is the level you have to think at when parsing — because everything to the parser may be relevant. Remember that it’s your language. The parser can only operate on what it finds.

So what I need the parser to do is handle line breaks. But where and when? Well, a line break can happen after any expression. So I have to add to my Expressions rule in grammar.y as such:

Here I’m saying that an Expressions rule can be made up of an Expression rule or an Expressions rule followed by a Terminator rule. What’s a Terminator rule? Well, I haven’t defined that. So I have to add that to the production rules. In fact, I’ll add that as the last rule:

Now run your test again — after recompiling the grammar file — and you should find that it passes.

This can feel like a hollow victory if you don’t have the slightest idea why anything is passing or why this whole set of code works.

When I ran into this problem, I found that the issue was I had no idea if this actually worked in a practical sense. For example, now that I can parse my class definition — simple as it is — can I now define a class in my “new” language? Well, not really. And that’s the trick. We need to represent the elements of our language in memory. That means we need a runtime. In order to make use of the runtime, you need an interpreter. These two sort of go hand in hand.

I’ll start with the runtime in the next post. Given that this post was fairly short and was mainly just connecting material, I expect the next post to be quite a bit longer.

Share

About Jeff Nyman

Anything I put here is an approximation of the truth. You’re getting a particular view of myself … and it’s the view I’m choosing to present to you. If you’ve never met me before in person, please realize I’m not the same in person as I am in writing. That’s because I can only put part of myself down into words.

If you have met me before in person then I’d ask you to consider that the view you’ve formed that way and the view you come to by reading what I say here may, in fact, both be true. I’d advise that you not automatically discard either viewpoint when they conflict or accept either as truth when they agree.

This entry was posted in Language Building. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *