A Tester Learns Rex and Racc, Part 2

In my previous post on this subject I started off on the learning process for building a lexer with the Rex (Rexical) tool. Here I want to update the logic I provided in that post to show how to make it more testable. I then want to expand on the example of using Rex with something a bit more substantive.

If you are following along from the last post, you would have the following in place:

  • A project directory called test_language.
  • A file in that directory called test_language.rex.
  • A file in that directory called test_language.rb.

I had mentioned that this simple test Ruby script was not a good way to test and that an actual test file would be better. So here I’ll describe how I put that in place.

  1. Within your project directory create another directory called spec.
  2. Move your test_language.rb file into the spec directory.
  3. Rename the test_language.rb file to language_lexer_spec.rb.
  4. Create a file in your project directory called Rakefile.

Put the following logic in the Rakefile:

Now you can generate your lexer just by typing

rake lexer

Let’s modify that language_lexer_spec.rb file so it looks like this:

Note here that I’m including the generated lexer.rb file. Also note that in the test I’m checking an index in the result array. That’s because what gets returned from the lexer is an array. In order for this test to work, however, you will have to change the logic that gets executed by your rules in the lexer specification. So change the test_language.rex file so that the rules look like this:

Here I just changed the Ruby logic to return string values rather than simply outputting them with the puts method. You can now run your test as follows:

rake spec

Now, of course, you can add as many tests as you want:

Granted, all of this is pretty much Test First in Ruby 101, but since this was something I didn’t do in the first post, I wanted to make sure to cover this here.

Okay, so let’s do something a little different with the lexer. Change the lexer file so that your rules look like this:

Now you could replace your previous tests with tests like these:

Notice that with these rules, I’m using regular expressions rather than any literals like I did with the “u” example. I’m also specifying an identifier (:DIGIT and :WORD) that is used to identify what type of match has been made. I could have used any terms I wanted to there but obviously being usefully descriptive is helpful. What you can now do is explore with different tests to see what happens. What you might find interesting, for example, is these two other tests:

Again, play around a bit. This is how I’ve been learning how the lexer part works. In fact, as a further means of playing around, you might want to swap the placement of the rules in the lexer specification and see what happens when you run your tests. Look for what fails and try to determine why.

Sometimes you can run into weird stuff that seems to be an artifact of how the logic of the lexer gets generated. For example, let’s say you want to recognize blank space so you add another rule:

You will be able to generate the lexer without a problem but when you try to run the tests you will be told that the lexer.rb file has some errors. I’m not entirely sure why this is at the moment, but it does give me a chance to introduce another section, similar to the inner section but with a different purpose. You can create macros, which are basically identifiers that stand for a pattern. For example, with the above example, I could do this instead:

Note that in order to use a macro in a rule, you must enclose the identifier of the macro within curly braces. The macro concept is really just giving a name to a common pattern that you want to use in the rule section. You could add the following test for this:

Here I’m testing for nil because no action was specified to take place if this rule was matched.

What you should note from this is that the one or more sections are included within the lexer class specification. In this example, we have macro, rule, and inner. Of those, only the rule section is required. A rule, at its simplest, is made up of a pattern to look for and the actions to take when that pattern is found. The pattern can be a literal string to look for or it can be a regular expression. The action is any valid Ruby code. This code can do any necessary processing and can optionally return a value. (If you are using a parser, like Racc, with your lexer then that output can be used by the parser.)

Now let’s do something that’s almost a little bit more exciting or, if not that, at least a little more substantive. This won’t be entirely unique since just about everyone does this when learning this stuff but — let’s build a calculator. We already have the basis for how to do some of this so this will just be an expansion of what we’ve already done. This will let us start getting into Racc as well. I’ll assume by now that you have the basics of how I create these rex files and the test files so I’ll just give contents rather than too many details about file names or directory structures.

So let’s say you have Rex file like this:

You can perform some simple recognition tests with the following:

That all seems to work as expected. But what about this test:

That’s what we would want a calculator to be able to handle, right? I want my calculator to have the ability to actually calculate some value as long as appropriate symbols and numbers are used. So what do you expect to happen? What should you fill in those ???? with in the tests? In fact, what you will find is that you have :‌DIGIT and 2 as the returned values because that’s what got parsed first in the string “2+2”.

That makes sense because remember: the lexer is just reading the input and trying to find matches. So it does this but stores each match in a different array. So the test should really be this:

So that’s great. But how do we actually get a calculation from this? How do get something that solves 2+2 and returns the value 4? Well, the lexer just parses the symbols. What we need to do is parse those symbols and take action based on them. That, finally, brings us to Racc. And that will have to wait for a different post. Again, though, I encourage you to play around with the lexer. The lexer specification is going to be the basis of any language you want to create. So getting comfortable with being able to express that language as a rex specification and being able to see how to pull information from an input is going to serve you well as you go into the parsing aspects.

Share

This article was written by Jeff Nyman

Anything I put here is an approximation of the truth. You're getting a particular view of myself ... and it's the view I'm choosing to present to you. If you've never met me before in person, please realize I'm not the same in person as I am in writing. That's because I can only put part of myself down into words. If you have met me before in person then I'd ask you to consider that the view you've formed that way and the view you come to by reading what I say here may, in fact, both be true. I'd advise that you not automatically discard either viewpoint when they conflict or accept either as truth when they agree.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.