The joy of pattern matching – part V

Ionbeam – an HTTP testing framework

Exercising an HTTP server, in order to test it, typically involves sending a request, extracting some information in the response, validating the response and, if the response is valid, use the extracted information to form the next request to the server (or to some other server if indicated by the information).

“Extracting some information” is in the context of this series simply “pattern matching” (template:match if we want to refer to the template library[1] we built in part II) and “forming the next request” is to fill in the values of a template (template:replace) before sending it to some server again. This means that, if we want to do some testing of some HTTP endpoint that we have developed we could use our template library, add some HTTP client code and we are good to go. That is what we will do in this blog post and the end result, called Ionbeam (for lack of a better name, where “beam”, at least, could be seen as Erlang related), can be found on github[2] as before.

Forming the Ionbeam

A test suite consists of a sequence of tasks where each task is defined as having a request template which is filled in with values from some input context. Performing the task and validating and extracting information from the response will produce an output context that can be used as input context for other tasks further down the sequence.

[
    %% do login
    {LoginTask, #{"USER_NAME" => "alice",
                  "PASSWORD" => "secret"}, 'LoginCtx'},

    %% do list items
    {ListItemsTask, 'LoginCtx', 'ListItemsCtx'}
]

An Ionbeam example testsuite consists of a list of tuples where each tuple consists of a task, a reference to an input context and a reference to an output context. The input context an either be a direct map, the name of a previously used context or a function returning a context.

A task, in Ionbeam, is represented by an Erlang map with fields of two categories – one that describes the request (such as method, host, path and headers fields) and one that details how the response is to be validated and what information should be passed to the output context. Another way of looking at the response fields is that they put constraints on the response.

LoginTask = #{

   request => #{
     method =>  "POST",
     host =>    "localhost",
     port =>    "8080",
     path =>    "/api/login",
     headers => "Accept: application/json\r\nContent-Type: application/json\r\n\r\n",
     body =>    "{\"uname\":\"$(USER_NAME)\",\"password\":\"$(PASSWORD)\"}"
   },

   response => #{
     status => [200],
     headers => #{
       match => ["Content-Type: application/json"]
     },
     body => #{
       match => "$(_A)\"token\":\"$(TOKEN)\"$(_C)"
     }
  }
}

An example task that will perform a login according to some made up API. The body references the variables USER_NAME and PASSWORD so these two variables need to be present in the input context. The response must have a status of 200, it must have at least one header Content-Type with the value application/json and the body must match the pattern shown. If the body matches, the TOKEN variable will be put into the output context. If the TOKEN is not found, or if any of the other matches fail, the task will fail.

If we then imagine, to continue the example, that the ListItemsTask requires authentication and that this authentication is done by inserting a header X-Token with the TOKEN value and that it will return json document in response to a GET request, its definition becomes something like:

ListItemsTask = #{

   request => #{
     host => "localhost",
     port => "8080", 
     headers => "X-Token: $(TOKEN)\r\nAccept: application/json\r\n\r\n",
     path => "/api/items"
   },

   response => #{
     headers => #{
       match => ["Content-Type: application/json"]
     }
   }
}

Defining the ListItem task which uses the TOKEN value obtained in the Login task. You see that method is not defined as so it will be a GET per default. There is no constraints on the body of the document but the default constraint on the status is 200 and the content type must be application/json

Suppose now that we WOULD like to put constraints on the body, but we know that the body could be huge and might not be fit to be parsed by the generic matcher library (or that the validation logic might not easily be expressed by a matching function). In this case we would, instead of specifying a match template, specify a validation function that can, given some domain knowledge, in a more efficient way parse and validate the body.

          body =>
              fun(Body, Ctx) ->
                      #{<<"members">> := Members} = jiffy:decode(Body, [return_maps]),
                      case lists:member(<<"stefan">>, Members) of
                          true ->
                              Ctx;
                          _ ->
                              {error, "stefan must be a member, but was not"}
                      end
              end

Instead of expressing validation of the body in terms of matching, one can
express validation as an Erlang function instead. This function takes two arguments - the body and the current context - and it returns a new context in the form of a map, or an error tuple (if the validation fails). In this example we parse the body using jiffy and check that the members array contains the required string.

Running it

When you have written all your tasks, and put them together in a sequence, you can then run them using the ionbeam:run_script function. This function will manage your input and output contexts for you so that you use them throughout the sequence and it will catch and report errors occurring.

ionbeam:run_script([
  %% do login
  {LoginTask, #{"USER_NAME" => "alice",
                "PASSWORD" => "secret"}, 'LoginCtx'},

  %% do list items
  {ListItemsTask, 'LoginCtx', 'ListItemsCtx'}, 

  ...
])

Summary

Pattern matching is a core concept in programming as it provides powerful tools that can be used within many different contexts. Throughout this series of posts about pattern matching I have tried to show how diverse use cases one can have for it, such as implementing some simple language processing (like Eliza in part III) or by implementing an HTTP server testing framework like in this post. Both problems are very different in context, but essentially solved in the same way – by matching patterns against literal strings. And it can be iterated once more, that pattern matching is not only a powerful way of expressing yourself, it also enhances the creative energy while programming, thus making programming more fun and rewarding.

References
  1. https://github.com/peffis/template
  2. https://github.com/peffis/ionbeam

The joy of pattern matching – part IV

Matching with transformations

In this post we will continue enhancing the template matcher[1] we developed in part II of this series. We add some functionality so that we can match and process slightly more complex patterns. The usefulness of matching e.g. palindromes could be debated though, so consider this to be an exercise we do because of the fact that pattern matching is…is fun! And palindromes are fun too, right?

To make it possible to present match examples in a brief manner, let us first introduce a notation for the function call template:match. Instead of every time writing out template:match(Template, String) below we will from now on simply write Template = String but it still means that it is a function call in our Erlang library. Ok? The resulting dictionary from the match will be written on the form {Key1 => Val1, Key2 => Val2, …}.

Suppose now that you would like to, for instance, match recurring patterns – like a pattern A, followed by some characters, say “xyz”, and then followed by another A. How would you write this in our current pattern matcher? It would be something like:

"$(A)xyz$(A)" = "abaxyzaba"

{A => "aba"}

Suppose now if we would like to do the above, but match only iff $(A) occurs twice after the string xyz? Can we do this? Well, of course we can, we just type $(A)$(A) twice there after “xyz”, but it would be more convenient with some expression that meant “occurs twice” or “occurs 4711 times” and such an expression we do not support yet in the implementation of our template matcher.

Or say that we would like to be able to match a pattern that starts with a pattern A and is then followed by another A, but a reversed A (this is a complicated way of describing “even length palindromes” such as “helloolleh”)? How would we do that when we cannot express “should occur again, but in reverse”? It would be fun if we could express things like ‘occurs in reverse’ as well or simply apply any function that transform one string value to another. Sounds like fun? Well, let us do it then! Let us introduce a generic transformation function to our template matcher that can transform the variable while we are doing the matching and see if the end result is interesting or not.

The syntax we introduce for this is $(A|<body>) where <body> will be converted to an erlang function body that will be applied to the variable A. By using Erlang’s “eval” capabilities we can actually include any Erlang code in the body of the transformation function. This means that, if you write $(A|lists:reverse(A)) the transformation function created would be:

fun(A) ->

    lists:reverse(A) 

end

, which would be applied to the variable during the matching, returning the transformed value (in this case the string in reverse).

Using this syntax we can then do the matches we discussed above.

"$(A)xyz$(A|A++A)" = "abxyzabab"

{A => "ab"}

The 'end with double the pattern' example

"$(A)$(A|lists:reverse(A))" = "olassalo"

{A => "olas"}

The palindromic 'end with pattern in reverse' example

Suppose we would like to match expressions that occur again, like above, but with some modification, say, with the first character “b” removed from A. This could be written using the ‘—‘ operator in Erlang, thus an example would be:

"$(A)$(A|A--\"b\")" = "abbaaba"

{A => "abba"}

Matching a pattern that occurs again but with the character 'b' removed once from the list

Suppose now that you would match expressions like the one above, but also match out what comes after such a combination of A followed by A — “b”. With some disappointment ahead of you, you might boldly try and write this as:

"$(A)$(A|A--\"b\")$(B)" = "abbaabarest"

{A => [], B => "abbaabarest"}

Trying to match out what is after the expression above presents us with probably one of the least interesting resolutions - not the one you wanted probably (as you expected B to become rest, right?) although still a correct match

The reason why the solution above resulted in such disappointment is because the current template matcher will always halt with the first match it finds and it will always find shorter solutions before longer. Here the shortest solution for A is of course to bind A to the empty string “” (or [] as it is also written in Erlang) because removing a “b” from the empty string is also the empty string.

Is there some way around this? Is there some way to enforce longer matches? Well, yes, there is! We can use the transformation function itself, not only to modify the string, but also to enforce constraints on the match. If the transformation function make sure that the match will fail if certain constraints are not fulfilled we will force the matcher to continue searching. For instance, we could check if A, in this case, is the empty string and return some unlikely string instead which would make the match fail for empty strings. The result of this is that we would therefore get the more interesting binding instead. Let’s try that:

"$(A|if length(A) == 0 -> \"err\"; true -> A end)$(A|A--\"b\")$(B)" = "abbaabarest"

{A => "abba", B => "rest"}

Using an Erlang if-clause (the if-clause is so rarely used in Erlang so I just had to work it in here this time) to make the match fail for zero-length A:s (by returning a string err or something else that will make the match fail) will force the matcher to find the longer match instead

Another interesting example is the one that finds the first occurrence of an integer where the integer is followed by itself plus one:

"$(A)$(B)$(B|if length(B) == 0 -> \"0\"; true -> integer_to_list(list_to_integer(B) + 1) end)$(C)" =  "11133433" 

{A => "1113", B => "3", C => "33"}


"$(A)$(B)$(B|if length(B) == 0 -> \"0\"; true -> integer_to_list(list_to_integer(B) + 1) end)$(C)" =  "471133343"

{A => "4711", B => "33", C => "3"}

Finding the first integer, B, where the integer is followed by itself plus one, while matching out what comes before, A, and after, C. This example uses an if-clause to avoid crashing when B is the empty string
Implementation

Implementation of the transformation function was done in commit 6df8227ac1245eb4296822fb0ee9b427e3db035a [2] and mostly involves a new parser for the new variable syntax, thus a new read_var_name function, but also a new eval function that, given a variable name, a body and a value, creates an Erlang function (a “fun”) from the body and applies the function on the value, like so:

eval(_, identity, Val) -> 
    %% this clause is taken when we do not have a transform defined, like only a variable $(A)
    Val;
eval(VarName, Body, Val) ->
    FunStr = "fun (" ++ VarName ++ ") -> " ++ Body ++ " end.",
    {ok, Tokens, _} = erl_scan:string(FunStr),
    {ok, [Form]} = erl_parse:parse_exprs(Tokens),
    {value, Fun, _} = erl_eval:expr(Form, erl_eval:new_bindings()),
    Fun(Val).

Then, whenever the template matcher would like to bind a variable A with a transform B it will first apply B on A before continuing with the match.

I am not sure that this transformation stuff actually has any big value – or any good use case – other than that it is fun to play around with it – like a puzzle. It was more or less just added because, well, because I could. In the next episode of this series on pattern matching I will however put the template matcher to do something more useful (in contrast to the exercises above) as I will show that the same template matcher can be used to implement an HTTP API testing framework – something that makes HTTP requests against some HTTP endpoints and validates that it gets the correct answers back. Stay tuned!

References
  1. https://github.com/peffis/template
  2. https://github.com/peffis/template/commit/6df8227ac1245eb4296822fb0ee9b427e3db035a

The joy of pattern matching – part III

The Eliza Effect

I was fascinated by artificial intelligence already when my interests in computers started. One of the earlier games I enjoyed on my C64 were the different text based adventures, like Zork [1], that used different text parsers to extract meaning from your text input. You typed commands such as “attack grue with sword”, the game parsed the text and produced some output.

Later, when I did an assignment in school (a Turbo Pascal Othello/Reversi game that my class mates enjoyed playing – I guess the school’s computers did not have many games back then) I became fascinated with the fact that my classmates so easily put a higher meaning to my computer program, referring to it as a “thinking machine”, getting angry at it for “tricking them on purpose” etc, when I knew how stupidly simple the algorithm behind the game was. I started reading about the Turing test [2] and then came across Joseph Weizenbaum’s work on Eliza [3] and his thoughts about the so called Eliza effect[4] and this tied together my fascination for text based adventures with my classmates’ reaction to my Othello game and with the Turing test and the Eliza effect.

ELIZA is an early natural language processing computer program created from 1964 to 1966 […] at the MIT Artificial Intelligence Laboratory by Joseph Weizenbaum. […] Created to demonstrate the superficiality of communication between man and machine, Eliza simulated conversation by using a ‘pattern matching’ and substitution methodology that gave users an illusion of understanding on the part of the program […]. Directives on how to interact were provided by ‘scripts’, […], which allowed ELIZA to process user inputs and engage in discourse following the rules and directions of the script.

– from ELIZA, Wikipedia[3]

Eliza has, since then, always returned to me in different ways. I wrote a version of it myself in Java back in the 90s when you could make web pages interactive through a, now dead, technology called “Java Applets”, I got to implement one version in Scheme in computer science class at KTH and when I started using Emacs I many times kicked off M-x doctor. I return to it now once again within the context of pattern matching and the template matcher that we developed in the previous part of this series of blog posts. Just as I don’t understand why pattern matching is so fascinating to me I do not understand why this simple Eliza program and its effect on people is so fascinating to me.

I am the psychotherapist.  Please, describe your problems.  Each time
you are finished talking, type RET twice.

I don't have any problems

Why do you say you do not have any problems?

Because it is the truth

Is it because it is the truth that you came to me?

Yes

I see...  Well, what makes you believe this is so?

God told me so

Earlier you said it is the truth?

I was lying then

Maybe your life has something to do with this.

Unlikely

Can you elaborate on that?

Nope

I would appreciate it if you would continue.

A session with M-x doctor in Emacs

The template matcher we wrote in the previous post has a match function that takes a template string and a string and, if it can find a match, it produces a dictionary where the variables in the template string has been assigned a value. The template matcher library also has the inverse function that, given a template and a dictionary it can replace all the variables in the template with the bindings in the dictionary. It turns out that this is all you need to implement Eliza and to have yet another meaningful discussion with a computer.

To create our own Eliza version by using the template matcher, we start by defining a set of patterns (or templates) that we should match user input against. Only your imagination is the limitation here but you should put more complex patterns first and then have simpler, catch-all patterns towards the bottom because we will try to match user input from top to bottom. It also turns out that it is good if the catch-all phrases at the bottom tries to lead the conversation someplace else, like starting with a question (ok, the “my sister was once bitten by a moose” below is more a sign of my fascination with Monty Python rather than trying to make a meaningful conversation).

patterns() ->
    [
     {"$(x)Where are you from$(y)",
      "I'm from $(c)",
      "I live in $(c)",
      "My country is $(c)"},
     {"$(x)my name is $(y)",
      "$(y)!. What a beautiful name!",
      "When I was young I wished my name was $(y)",
      "My name is $(n) but you probably knew that already",
      "$(y)? ... $(y)? Isn't there a movie star called $(y)?"},
     {"$(x)I am from $(y)",
      "From $(y)!? I am from $(c) myself",
      "So do you like it in $(y) then?",
      "I come from $(c) if you wondered",
      "$(y). Is that far from $(c)?"},
     {"$(x)I remember $(y)",
      "Do you often think of $(y)?",
      "Does thinking of $(y) bring anything else to mind?",
      "What else do you remember?",
      "Why do you recall $(y) right now?",
      "What in the present situation reminds you of $(y) ?",
      "What is the connection between me and $(y) ?"},
     {"$(x)do you remember $(y)",
      "Did you think I would forget $(y)?",
      "What about $(y)?",
      "Why do you think I should recall $(y) now?",
      "You mentioned $(y)?"},

...

     {"$(x)",
      "Do you like computers then since you are using the Internet?",
      "So, where do you live?",
      "Can you play the saxophone?",
      "Why do you speak so much?",
      "Why are you here",
      "Where are you from?",
      "What age are you?",
      "What's your name then?",
      "Please tell me more about your family",
      "What are you interested in?",
      "Do you have any problems?",
      "I don't have a family and that's bothering me a bit",
      "I like coffee",
      "Very interesting. Please tell me more",
      "I hate you. Do you know that?",
      "I like you sometimes. Did you know that?",
      "I'm a lumberjack and I'm ok...",
      "I don't understand what you mean really. Please explain",
      "Can you explain that a bit more",
      "To be or not to be that is the question",
      "My sister was bitten by a moose once",
      "Moose bites can be very dangerous, you know",
      "I apologize for everything I said so far",
      "I am sorry I disturbed you",
      "Do you know how to stop chatting?",
      "I've been here all my life and I'm getting pretty tired of it",
      "How long is the TTL-field in an IP-header?",
      "Do you think it's possible to program a computer to speak like a human beeing",
      "I dreamt about a giant hedgehog last night",
      "I remember that you mentioned something like that earlier",
      "Did you know that I can't think?"}
    ].

The first line in the pattern tuples is the pattern we match against and the rest of the patterns are the answers we could return if that pattern matches. The variables, as you might see, will carry over from the match to the later replace on the selected answer (where the answer is selected with just a random index).

What one also normally do in any Eliza implementation is to switch pronouns in the user input according to fixed rules. So if the user would write “I remember you saying so” it might match the pattern “${X}I remember ${Y}” and if we would like to send back an answer pattern such as “Can you really remember ${Y} ?” we would have to switch “you” to “me” before we send back the pattern. So the match would produce a dictionary

#{"X" => "", "Y" => "you saying so"}

, we apply the switch rules and produce the dictionary

#{"X" => "", "Y" => "me saying so"}

which would create the answer “Can you really remember me saying so?”.

So the whole Eliza process becomes something like:

say(Something) ->
    {Answer, Dictionary} = find_first_match(Something),
    SwitchedDictionary = switch_pronouns(Dictionary),
    template:replace(Answer, SwitchedDictionary).

, where find_first_match tries to find a matching template in the pattern list from top to bottom and, if it’s found, returns the Answer template, and the resulting Dictionary we should use to form an answer. The function switch_pronouns will modify the dictionary by replacing “you” with “me” etc and then we form the answer by calling template:replace. That’s all. Below follows a short example of what a conversation with this program would be like. The full code is at github.

Eshell V9.1.2  (abort with ^G)
(eliza@127.0.0.1)1> eliza:say("Hello Eliza!").
"WHAT DO YOU MEAN!? Eliza!?"

(eliza@127.0.0.1)2> eliza:say("I just said hello").
"Why do you speak so much?"

(eliza@127.0.0.1)3> eliza:say("Because you ask me questions").
"Try to think of other reasons"

(eliza@127.0.0.1)4> eliza:say("Because horses are pink?").
"Is that the only reason you think?"

(eliza@127.0.0.1)5> eliza:say("No, I am sure there are plenty of reasons?").
"Really! Are you sure there are plenty of reasons?"

(eliza@127.0.0.1)6> eliza:say("No, now that you say so, I am not sure any longer").
"What if you weren't not sure any longer?"

(eliza@127.0.0.1)7> eliza:say("Now you are confusing me").
"What makes you think I am confusing you?"

(eliza@127.0.0.1)8> eliza:say("The fact that I am getting confused?").
"I know for sure that I'm not getting confused"

(eliza@127.0.0.1)9> eliza:say("That is good for you, Eliza").
"I dreamt about a giant hedgehog last night"

(eliza@127.0.0.1)10> eliza:say("That is very interesting indeed").
"What are you interested in?"

(eliza@127.0.0.1)11> eliza:say("I am interested in football and giant squids").
"Why do you say "am"?"

(eliza@127.0.0.1)12> eliza:say("Hmm, what?").
"What age are you?"

(eliza@127.0.0.1)13> eliza:say("I am 43 years old").
"I don't understand that"

(eliza@127.0.0.1)14> eliza:say("Really?").
"Do you have any problems?"

(eliza@127.0.0.1)15> eliza:say("I think you are one of my worst problems").
"Please don't think so much. You will get a headache"

A session with the eliza we developed in this blog post

Using the template matcher to implement Eliza is actually a bit of an overkill as a simple regexp match would be enough. The template matcher offers more than regexps, such as enforcing equal binding of repeating patterns in the template, but this is not needed in this Eliza implementation. In the next part of this series on pattern matching however, we will extend the template matching library to support more advanced matching functionality – things that you normally cannot do with simple regexp matching.

References
  1. https://en.wikipedia.org/wiki/Zork
  2. https://en.wikipedia.org/wiki/Turing_test
  3. https://en.wikipedia.org/wiki/ELIZA
  4. https://en.wikipedia.org/wiki/ELIZA_effect