The joy of pattern matching – part IV

Matching with transformations

In this post we will continue enhancing the template matcher[1] we developed in part II of this series. We add some functionality so that we can match and process slightly more complex patterns. The usefulness of matching e.g. palindromes could be debated though, so consider this to be an exercise we do because of the fact that pattern matching is…is fun! And palindromes are fun too, right?

To make it possible to present match examples in a brief manner, let us first introduce a notation for the function call template:match. Instead of every time writing out template:match(Template, String) below we will from now on simply write Template = String but it still means that it is a function call in our Erlang library. Ok? The resulting dictionary from the match will be written on the form {Key1 => Val1, Key2 => Val2, …}.

Suppose now that you would like to, for instance, match recurring patterns – like a pattern A, followed by some characters, say “xyz”, and then followed by another A. How would you write this in our current pattern matcher? It would be something like:

"$(A)xyz$(A)" = "abaxyzaba"

{A => "aba"}

Suppose now if we would like to do the above, but match only iff $(A) occurs twice after the string xyz? Can we do this? Well, of course we can, we just type $(A)$(A) twice there after “xyz”, but it would be more convenient with some expression that meant “occurs twice” or “occurs 4711 times” and such an expression we do not support yet in the implementation of our template matcher.

Or say that we would like to be able to match a pattern that starts with a pattern A and is then followed by another A, but a reversed A (this is a complicated way of describing “even length palindromes” such as “helloolleh”)? How would we do that when we cannot express “should occur again, but in reverse”? It would be fun if we could express things like ‘occurs in reverse’ as well or simply apply any function that transform one string value to another. Sounds like fun? Well, let us do it then! Let us introduce a generic transformation function to our template matcher that can transform the variable while we are doing the matching and see if the end result is interesting or not.

The syntax we introduce for this is $(A|<body>) where <body> will be converted to an erlang function body that will be applied to the variable A. By using Erlang’s “eval” capabilities we can actually include any Erlang code in the body of the transformation function. This means that, if you write $(A|lists:reverse(A)) the transformation function created would be:

fun(A) ->

    lists:reverse(A) 

end

, which would be applied to the variable during the matching, returning the transformed value (in this case the string in reverse).

Using this syntax we can then do the matches we discussed above.

"$(A)xyz$(A|A++A)" = "abxyzabab"

{A => "ab"}

The 'end with double the pattern' example

"$(A)$(A|lists:reverse(A))" = "olassalo"

{A => "olas"}

The palindromic 'end with pattern in reverse' example

Suppose we would like to match expressions that occur again, like above, but with some modification, say, with the first character “b” removed from A. This could be written using the ‘—‘ operator in Erlang, thus an example would be:

"$(A)$(A|A--\"b\")" = "abbaaba"

{A => "abba"}

Matching a pattern that occurs again but with the character 'b' removed once from the list

Suppose now that you would match expressions like the one above, but also match out what comes after such a combination of A followed by A — “b”. With some disappointment ahead of you, you might boldly try and write this as:

"$(A)$(A|A--\"b\")$(B)" = "abbaabarest"

{A => [], B => "abbaabarest"}

Trying to match out what is after the expression above presents us with probably one of the least interesting resolutions - not the one you wanted probably (as you expected B to become rest, right?) although still a correct match

The reason why the solution above resulted in such disappointment is because the current template matcher will always halt with the first match it finds and it will always find shorter solutions before longer. Here the shortest solution for A is of course to bind A to the empty string “” (or [] as it is also written in Erlang) because removing a “b” from the empty string is also the empty string.

Is there some way around this? Is there some way to enforce longer matches? Well, yes, there is! We can use the transformation function itself, not only to modify the string, but also to enforce constraints on the match. If the transformation function make sure that the match will fail if certain constraints are not fulfilled we will force the matcher to continue searching. For instance, we could check if A, in this case, is the empty string and return some unlikely string instead which would make the match fail for empty strings. The result of this is that we would therefore get the more interesting binding instead. Let’s try that:

"$(A|if length(A) == 0 -> \"err\"; true -> A end)$(A|A--\"b\")$(B)" = "abbaabarest"

{A => "abba", B => "rest"}

Using an Erlang if-clause (the if-clause is so rarely used in Erlang so I just had to work it in here this time) to make the match fail for zero-length A:s (by returning a string err or something else that will make the match fail) will force the matcher to find the longer match instead

Another interesting example is the one that finds the first occurrence of an integer where the integer is followed by itself plus one:

"$(A)$(B)$(B|if length(B) == 0 -> \"0\"; true -> integer_to_list(list_to_integer(B) + 1) end)$(C)" =  "11133433" 

{A => "1113", B => "3", C => "33"}


"$(A)$(B)$(B|if length(B) == 0 -> \"0\"; true -> integer_to_list(list_to_integer(B) + 1) end)$(C)" =  "471133343"

{A => "4711", B => "33", C => "3"}

Finding the first integer, B, where the integer is followed by itself plus one, while matching out what comes before, A, and after, C. This example uses an if-clause to avoid crashing when B is the empty string
Implementation

Implementation of the transformation function was done in commit 6df8227ac1245eb4296822fb0ee9b427e3db035a [2] and mostly involves a new parser for the new variable syntax, thus a new read_var_name function, but also a new eval function that, given a variable name, a body and a value, creates an Erlang function (a “fun”) from the body and applies the function on the value, like so:

eval(_, identity, Val) -> 
    %% this clause is taken when we do not have a transform defined, like only a variable $(A)
    Val;
eval(VarName, Body, Val) ->
    FunStr = "fun (" ++ VarName ++ ") -> " ++ Body ++ " end.",
    {ok, Tokens, _} = erl_scan:string(FunStr),
    {ok, [Form]} = erl_parse:parse_exprs(Tokens),
    {value, Fun, _} = erl_eval:expr(Form, erl_eval:new_bindings()),
    Fun(Val).

Then, whenever the template matcher would like to bind a variable A with a transform B it will first apply B on A before continuing with the match.

I am not sure that this transformation stuff actually has any big value – or any good use case – other than that it is fun to play around with it – like a puzzle. It was more or less just added because, well, because I could. In the next episode of this series on pattern matching I will however put the template matcher to do something more useful (in contrast to the exercises above) as I will show that the same template matcher can be used to implement an HTTP API testing framework – something that makes HTTP requests against some HTTP endpoints and validates that it gets the correct answers back. Stay tuned!

References
  1. https://github.com/peffis/template
  2. https://github.com/peffis/template/commit/6df8227ac1245eb4296822fb0ee9b427e3db035a

Author: peffis

I was Slashdotted in 2004. After that, not much.