Wednesday, August 3, 2011

The forgotten control flow construct

There is a truly powerful control flow construct that many programmers have forgotten. It helps implement complex flows that ifs, whiles and matches can't handle. Not even throwing and catching exceptions can measure itself to this construct.

You may wonder what kind of situation would require something that traditional constructs can't handle. Here is an example taken from Go To Statement Considered Harmful: A Retrospective.

int parse()
{
    Token   tok;

reading:
    tok = gettoken();
    if (tok == END)
        return ACCEPT;
shifting:
    if (shift(tok))
        goto reading;
reducing:
    if (reduce(tok))
        goto shifting;
    return ERROR;
}

Let's see how using a while turns out:

type Token = END

type ParseState = Reading | Shifting | Reducing | Accepted | Errored
type Status = ACCEPT | ERROR

let parse gettoken shift reduce () =
    let mutable state = Reading
    let mutable tok = Unchecked.defaultof<Token>
    
    while state <> Accepted && state <> Errored do
        state <-
            match state with
            | Reading ->
                tok <- gettoken()

                if tok = END then
                    Accepted
                else
                    Shifting

            | Shifting ->
                if shift tok then
                    Reading
                else
                    Reducing

            | Reducing ->
                if reduce tok then
                    Shifting
                else
                    Errored

            | Errored | Accepted -> failwith "Invalid state"

    match state with
    | Errored -> ERROR
    | Accepted -> ACCEPT
    | _ -> failwith "Non-final state"

(Note that I have introduced extra parameters gettoken, shift and reduce; I wanted to be able to write concise code that syntactically correct without explicitly defining these functions).

It looks OK, but we had to introduce an extra type, ParseState. There are also some issues with mutability and initialization of tok.

If you have some experience with functional programming, you are probably thinking that recursion is the right tool. I will leave writing a recursive version of parse to the reader, and jump to a slightly different idea that many might have missed.

type Token = END | SOMETHINGELSE

type Status = ACCEPT | ERROR

let parse gettoken shift reduce () =
    let rec do_read() =
        match gettoken() with
        | END -> ACCEPT
        | _ as tok -> do_shift tok

    and do_shift tok =
        if shift tok then
            do_read()
        else
            do_reduce tok

    and do_reduce tok =
        if reduce tok then
            do_shift tok
        else
            ERROR

    do_read()

do_read, do_shift and do_reduce are mutually recursive functions. All recursive calls are in tail position, which makes it possible to avoid stack overflows (on some platforms...).

Here you have it: calls in tail position are basically safe versions of goto.

The fantastic control flow construct that many have forgotten is simply the procedure call. Every programmer uses it a countless number of times every working day, but fear of recursion and stack overflows have kept many from using it in situations where it fits well.

17 comments:

Erik said...

I'm actually quite the fan of recursive programming, so your mutually recursive solution at the end was the first to leap to my mind.

gradbot said...

While recursive functions are awesome, people don't often think of them as a method of storing state. Using them to store state can be abused just like most things. Adding else cases to the if's in the original goto code makes it look very similar to the recursive code. There is a fine line between creating functions to organize your code for readability and creating functions to represent state in an algorithm (like this). It's also fun to over analyze this classic code example.

David said...

As soon as I started reading your article, the note on Message Processing in Expert F# 2.0@p.379 flashed in my mind... I knew where you were going.

Michael McMullen said...

int parse()
{
Token tok;
while ((tok = gettoken()) != END)
while (!shift(tok))
if (!reduce(tok))
return ERROR;
return ACCEPT;
}

Joh. said...

@gradbot: What would be the better alternative? At least in the functional solution, it's clear who sets tok and who reads it. That's not obvious at all in the original statement of the algorithm.

@Michael: Your algorithm does not behave in the same way as the original statement. In your code, shift and reduce must be able to handle the case where tok == END, whereas the original algorithm returns immediately.

In general, trying to encode a state machine that's represented using an automaton in the design by a bunch of whiles and ifs in code is tricky. Add or remove an arrow here and there in the design, and suddenly it's not so clear anymore how to reflect that in your code and still have confidence that the code and the design agree.

@David and Erik: Always nice to see one's ideas have been had by other people before, it shows I'm not crazy :)

@David, do you know if the note is also in the first edition of the book? I don't own the second edition :(

Joh. said...

Ah, never mind the first part of my comment to Michael, his code is correct. I was fooled by the indentation.

Michael McMullen said...

Yeah, sorry about that. The title of your post caught my eye while I was at work and I couldn't resist reading it and then posting, but I couldn't take the time to explain myself properly. And blogger messed up the formatting :).

I do like your mutually tail recursive solution, and some might say that the original goto implementation is nicer to read than my while loop code (even with the formatting). I was just trying to make the point that the iterative implementation didn't need to be as complex as you showed.

Maht said...

If you want to go somewhere, goto is the best way to get there. — Ken Thompson

Goto is used >90k times in the 2.6.38 linux kernel, a 10x increase over 2.4.0

http://temp.thera.be/kernelcurses.png

Joh. said...

@Maht: I'm not debating the usefulness of goto, I think everyone agrees it's useful sometimes, especially in C which lacks exceptions.

The point I'm trying to make is that in languages which allow function definitions anywhere and provide tail call removal, function calls offer the functionality you want from goto that you can't get from if, while and exceptions.

Tommy McGuire said...

It's been a while since I read it, but Guy Lewis Steele, Jr.'s Debunking the 'Expensive Procedure Call' Myth, or, Procedure Call Implementations Considered Harmful, or, Lambda: The Ultimate GOTO comes to mind. Along with the other "Lambda Papers", available from readscheme.org.

Joh. said...

@Tommy McGuire: Interesting paper (which was written before I was born :)
My blog post basically duplicates the content of section E in that paper. The paper reproduces an example from Yourdon of a state machine with 23 states, which is a bit more interesting than the small read-shift-reduce example shown here.

David said...
This comment has been removed by the author.
David said...

Since I didn't know about Expert F# first edition, here's a screenshot of the Nota bene:

http://goo.gl/QIETU

Tommy McGuire said...

Joh, I'm a failed academic; all I do is point to stuff other people have written.

In this case, what you have found is a very neat idea that deserves more exposure.

Joh. said...

@David: Thanks. I found the note in the first edition, at page 381.

@Tommy McGuire: A "failed academic"? Don't be so hard on yourself ;) Thanks for pointing me to that paper, it was an interesting read.

gradbot said...

@Joh I agree that the functional implementation is more readable and still provides the same performance as a goto. Tailcall optimizations are what really allow us to transform gotos into functional code. I got carried away with the code and separated the two concerns out (http://fssnip.net/6Z).

David said...

Revisiting this older blog entry here and after doing a going through most of Scott Wlaschin's excellent FSharpForFunAndProfit series, in particular the Designing with Types series I noticed something.

In your first example the two "required" failwith is an indication that Accepted and Errored have been added to ParseState due to the while loop.

Splitting that into it's own type, let's say FinalState reveals the problem. We must have those invalid routing because the while loop is an expression on state and we need a way to break out. Errored/Accepted have been artificially added to make the code possible - possible because in F# we can't break out early (no break/return statements).

Incidentally, Michael McMullen's code is doing exactly that which makes it possible to return the one type. That or we would have had to use a separate mutable boolean for the control which is just as smelly.

Removing the two cases from ParseState leads us inevitably to Joh's second solution (unless we give up figuring it out) which is F# idiomatic. I don't think we can find a "correct" (in the sense that I mean) that doesn't involve a state machine, but would be happy to be proven wrong.