The joy of pattern matching – part I

The passion

Let us assume a template language of all character sequences containing any number of lowercase and uppercase characters ([A-Za-z]*), where lowercase characters are literals – constants with a name – and uppercase characters are placeholders for any sequence of lowercase characters (including the empty sequence).

"abc"
"aABd"
""
"abcasdasdlkjasdLKJKJDASSASaaS"

Some examples of the template language

Then, given such a template, play the game of trying to match the template with a sequence containing only lowercase characters, where “match” is the problem of finding values for all the uppercase characters so that the two sequences become equal. For instance, try to figure out what sequences the uppercase characters would represent in the below examples:

"abA" = "abc"
"aAa" = "abba"
"aAaA" = "acdacd"
"aABa" = "abba"

Match is denoted with the character '='

I’m curious. Did you enjoy playing this game? And did you think about all the different solutions for the fourth example? For me, even this simple game gives me a very odd satisfaction which I cannot really explain. Could it be some kind of OCD[1]?

I returned to using Erlang some time around year 2000 after having spent several years building systems in Pascal, C, C++, Python, Perl, PHP, Javascript and (far too much) Java. From the start, Erlang just felt “right” and over the years I have grown more and more fond of it. I often wonder why it has the appeal it has on me. Why does not Go (I do a lot of Go programming as well) have the same appeal as Erlang? I sometimes give vague reasons like “Erlang matches well how I solve computer science problems” without knowing what I really mean, or I try to find more well-defined reasons like “on a multi-core CPU it really is an awesome way to make good use of the hardware” or “the fail-fast methodology with supervision trees makes for very robust software”, but the more I think about it, what really attracts me, is the pattern matching abilities of this beautiful language in combination with its functional programming paradigm.

In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. […] The patterns generally have the form of either sequences or tree structures. Uses of pattern matching include outputting the locations (if any) of a pattern within a token sequence, to output some component of the matched pattern, and to substitute the matching pattern with some other token sequence (i.e., search and replace).

– from Pattern matching, Wikipedia[2]

In Erlang the matching operator is the equal sign, ‘=’ (which is a bit confusing for people coming from other programming languages where an equal sign means “assign a value to a variable”). In Erlang, the equal sign simply means that you match the left-hand side with whatever is on the right-hand side with the side effect that you, if the match succeeds, bind any variables found in the pattern on the left-hand side. In Erlang, once a variable is bound to a value as a result of a match, it will always represent that value within that scope, forever. You can never “re-bind” a variable to a new value (or “re-assign” if you think in more traditional terms). This can be shown in our invented sequence language from above with the following three sequence example

"abcd"
"aAd"
"Abc"

A non-matching example

Here we realize that the three sequences together cannot match as the first two sequences would bind A to bc (in some imaginary “context” where we store our bindings) which would make the third sequence bcbc, which does not match the other two sequences. In Erlang, when you try to match and fail doing so, you get an exception, like in the below Erlang example

A = 1,
A = 2.
** exception error: no match of right hand side value c

, where you get an exception on line 2 because A is on line 1 bound to the integer 1 and integer 1 cannot ever match integer 2.

1 = 2.
** exception error: no match of right hand side value 2

but the below code is fine of course as 1 matches 1 (although it is code seldom seen in real life I suppose)

1 = 1.

Are you still reading this? Ok, then perhaps you must have the same odd fascination for pattern matching as I do, or perhaps you suffer from something else ­čÖé

The pattern matching in Erlang [3] not only makes your code expressive and therefore short (which is an important beauty aspect of a program), but it also gives you these extremely charming puzzles (like the game above) to solve while you program which brings some kind of creative energy to the programming experience. There just is something charming with taking two patterns and see how they would match (if they would match at all). Pattern matching is however by no means unique to Erlang. It exists in many other languages such as Snobol, Haskell and nowadays somewhat even in Python and Javascript. In languages where it does not exist however I nowadays dearly miss its powers.

Pattern matching in Erlang

Pattern matching plays a large part in any Erlang program. One example is how Erlang “selects” function clauses to execute as in the example below where sum_list/2 has two clauses starting on lines 6 and 9 and the clause to execute is selected depending on matching of the two arguments (where in this case the value of the second argument plays no role) – if the list is empty, take the first clause, otherwise select the second clause.

%% calculate the sum of all numbers in a list
sum_list(Ls) ->
    sum_list(Ls, 0).


sum_list([], C) ->
    C;

sum_list([A | Rest], C) ->
    sum_list(Rest, C + A).

You can match on any term, any type, in Erlang just as long as the left-hand side of the match follows the same structure as the right-hand side. The left-hand side may include unbound variables but the right-hand side can only include bound variables.

You can match lists

[1, A | B] = [1, 2, 3, 4, 5].
%% A = 2,
%% B = [3, 4, 5]

You can match strings (which are lists, but there is a syntactical sugar for matching strings which is worth noting)

"prefix" ++ A = "prefixa".
%% A = "a"

You can match binaries (where you have the option to use special bit-field specifiers)

%% matching out the header fields from a binary IP Packet
<<?IP_VERSION:4, HLen:4, SrvcType:8, TotLen:16,
      ID:16, Flgs:3, FragOff:13,
      TTL:8, Proto:8, HdrChkSum:16,
      SrcIP:32,
      DestIP:32, RestDgram/binary>> = Packet

You can match maps

#{a := A, b := #{c := C}} = #{a => 1, b => #{c => 2}}.
%% A = 1
%% C = 2

You can match tuples

{A, B} = {1, 2}.
%% A = 1
%% B = 2

…and well, any term constructed from any combination of types you can think of

{A, [#{key := "abc" ++ C}, B]} = {a, [#{key => "abcdef"}, 2]}.
%% A = a
%% B = 2
%% C = "def"

Is this not beautiful? Making use of matching can make your code very dense and expressive and, to some (like me), it also enhances the pleasure of writing code. Pattern matching can also be used to enforce constraints on your data structures so that, if a match fails, you know that your program is in a bad state and should be restarted (in Erlang you just let things crash and have some supervisor restart the failed service).

I will continue writing about pattern matching in more general terms in part two of this post.

References
  1. https://en.wikipedia.org/wiki/Obsessive%E2%80%93compulsive_disorder
  2. https://en.wikipedia.org/wiki/Pattern_matching
  3. http://erlang.org/doc/reference_manual/patterns.html

Author: peffis

I was Slashdotted in 2004. After that, not much.

Leave a Reply

Your email address will not be published. Required fields are marked *