Andrey Listopadov

GRScheme Design Part 3 - Quasiquoting

@programming rust grscheme ~11 minutes read

At the end of the Part 2 I mentioned that quasiquoting was problematic in GRScheme, due to some of the internal representations and algorithms, but now it is not an issue (I hope).

I’ve figured out the correct way to maintain a list invariant, splice one tree into another tree, and so on. So this post will describe how quasiquote, unquote, and unquote-splicing procedures are done in GRScheme.

Also, I should mention, that I’ve changed my mind, and now results are printed without the leading quote:

;; before
> (car '(a b c))
'a
> (car '('a 'b 'c))
''a
;; now
> (car '(a b c))
a
> (car '('a 'b 'c))
'a

I was using this format because Racket language was the first thing I saw when I touched the world of Scheme and Lisp, so it was natural to me, to see that results are quoted. Although after some interactions with Clojure, Common Lisp, and GNU Guile, I’ve changed my mind. This doesn’t affect internal representation though, only printing. But if you don’t know about it, it probably will not matter at all.

Quasiquoting

First things I’ve did were quasiquote and unquote. Although those are directly related, since we can’t use unquote outside of quasiquote, let’s first look at how plain quasiquote behaves.

In Racket, quasiquote of any quote-able form yields quoted form:


Note: code examples below are representing Racket and GRScheme REPLs, so for better distinguishing purposes, each prompt will be marked with a language name, e.g. racket> or grscheme>.


racket> `a
'a
racket> `(a)
'(a)

This is also the case for GRScheme:

grscheme> `a
a
grscheme> `(a)
(a)

All nested quasiquote calls are preserved in their original form, e.g. not changed to quote:

racket> ``a
'`a
racket> `(`a)
'(`a)

And in GRScheme:

grscheme> ``a
`a
grscheme> `(`a)
(`a)

This means that the interpreter should be aware of what quasiquote it is expanding at the moment - is it nested or not? We’re not going to explicitly check for this by analyzing expressions up the tree, but instead, we’ll add a counter to our eval function:

pub fn eval(&mut self, expression: &NodePtr) -> Result<NodePtr, EvalError> {
    let mut stack = vec![expression.clone()];
    let mut res = None;
    let mut quasiquote_level = 0;

    while let Some(expr) = stack.last() {
        res = match Self::expression_type(&expr) {
            // …
        }
    }
}

So when our quasiquote_level is zero, we’re going to interpret its result as quoted form, and as quasiquote form otherwise. I know that this is probably not the best way to manage this, but believe me, the worst is yet to come. Quasiquoting in GRScheme is really ad hoc right now, but it works.

Now we need to actually interpret quasiquote forms. Let’s look at match arms that are responsible for that:

pub fn eval(&mut self, expression: &NodePtr) -> Result<NodePtr, EvalError> {
    let mut stack = vec![expression.clone()];
    let mut res = None;
    let mut quasiquote_level = 0;
    let mut unquote_level = 0;

    while let Some(expr) = stack.last() {
        res = match Self::expression_type(&expr) {
            // …
            Type::Quasiquote if quasiquote_level == 0 || quasiquote_level == unquote_level => {
                let args = Self::rest_expressions(&expr)?;
                let unquotes = Self::pre_quasiquote(&args)?;
                if !unquotes.is_empty() {
                    stack.extend_from_slice(&unquotes);
                }
                quasiquote_level += 1;
                continue;
            }
            Type::Quasiquote => {
                let (res, _) = Self::quote(&Self::rest_expressions(&expr)?)?;
                Tree::replace_tree(&expr, res);
                quasiquote_level -= 1;
                stack.pop()
            }
            // …
        }
    }
}

Notice that I’ve added unquote_level counter, and we use both to decide what match arm to take. If our quasiquote_level is 0 or our quasiquote_level is equal to unquote_level we’re taking first branch, that extends our tree traversal stack, increments quasiquote_level and goes to next iteration. Otherwise, we quote the result, which means that we’re out of current quasiquote context.

Let’s look at the first arm in more detail. Let’s imagine that we’re evaluating `(a b ,c d) expression here. First, we take out the vector of all arguments to quasiquote, which in this case is [a, b, (unquote c), d]. Next we pass it to pre_quasiquote procedure. As was described in Part 2 all pre_something procedures are used to decide what arguments should be evaluated before we go further. Mostly what we did before, was just filtering of arguments based on their type. But that’s not really the case for pre_quasiquote, because unquote has another property that must be maintained. But first, let’s what unquote does.

Let’s look at this expression:

racket> (define c 42)
racket> `(a b ,c d)
'(a b 42 d)

We can see that ,c was replaced with the value of c. That’s what unquote does in quasiquote. One can think of quasiquoting as of inversion of evaluation of list arguments:

racket> (list 'a 'b c 'd)
'(a b 42 c)

You can see that everything was quoted except c. In quasiquote form only c was unquoted. But there’s another property for unquote and it is quasiquote nesting level:

racket> `(a '(b ,c d))
'(a '(b 42 c))

Nothing changed, c still evaluates to 42, but if we change ' to `:

racket> `(a `(b ,c d))
'(a `(b ,c d))

Nothing happened to c. This is because there were two nested quasiquote calls and only one unquote one. If we add another unquote we’ll see this:

racket> `(a `(b ,,c d))
'(a `(b ,42 d))

,c was replaced by 42, but outer , is still there, thus ,42 is the result of ,,c. Nested quasiquotes must be escaped by the same amount of nested unquotes. This is also the case for unquote-splicing.

Racket documentation features this weird expression:

racket> `(1 ```,,@,,@(list (+ 1 2)) 4)
'(1 ```,,@,3 4)

That is quite complex, but it works in GRScheme in the same way:

grscheme> (define list (lambda x x))
grscheme> `(1 ```,,@,,@(list (+ 1 2)) 4)
(1 ```,,@,3 4)

With this information, we can now look at how pre_quasiquote is implemented.

Filtering arguments for quasiquote

The pre_quasiquote procedure is quite big, so we’ll split it into parts as was done in previous posts.

The first thing we do is check for argument count, and create two stacks, one for tree traversal, and another for unquote forms that are valid to evaluate:

fn pre_quasiquote(args: &[NodePtr]) -> Result<Vec<NodePtr>, EvalError> {
    Self::check_argument_count("quasiquote", ArgAmount::NotEqual(1), args)?;
    let mut stack = vec![args[0].clone()];
    let mut unquotes = vec![];

    // …

    Ok(unquotes)
}

In the end, we will return a vector of unquote forms, which may be empty, if no valid forms were found, either because of the fact that levels weren’t balanced, or because there simply were no unquote calls.

Next, we iterate over our stack, as in eval procedure, and check for expression type on each iteration:

while let Some(expr) = stack.pop() {
    match Self::expression_type(&expr) {
        // …
    }
}

Depending on what type we’ve got we going to, either put it to the stack, put it to quotes, or continue. The first type, which is the simplest case, that we need to check is Procedure:

Type::Procedure => stack.extend_from_slice(&expr.borrow().siblings),

If we see (something …) we expand our stack with all inside expressions, and continue.

Next arm checks for Quasiquote, Symbol, and List. We do one more check for arguments, because all these types take exactly one argument, and if everything is OK, we continue:

Type::Quasiquote | Type::Symbol | Type::List => {
    let args = Self::rest_expressions(&expr)?;
    Self::check_argument_count("quasiquote", ArgAmount::NotEqual(1), &args)?;
    stack.extend_from_slice(&args);
}

The second to last arm is for all kinds of unquote expressions, and that’s where everything happens. The last arm does nothing, as we don’t need to extend the stack with other types:

Type::Unquote | Type::UnquoteSplicing => {
    let args = Self::rest_expressions(&expr)?;
    Self::check_argument_count("unquote", ArgAmount::NotEqual(1), &args)?;
    let (q, u) = Self::quasiquote_unquote_levels(&expr);

    if q == 0 {
        return Err(EvalError::UnquoteNotInQquote);
    } else {
        if q == u {
            unquotes.push(expr.clone());
        }
        match Self::expression_type(&args[0]) {
            Type::Procedure | Type::Quasiquote | Type::List | Type::Symbol => {
                stack.extend_from_slice(&args[0].borrow().siblings)
            }
            Type::Unquote | Type::UnquoteSplicing if q > u => {
                stack.push(args[0].clone())
            }
            Type::Unquote | Type::UnquoteSplicing => {
                return Err(EvalError::UnquoteNotInQquote)
            }
            _ => (),
        }
    }
}
_ => (),

So what’s happening here is another check for arguments, and quasiquote_unquote_levels call. This function analyses the current expression, and all its parents to be either quasiquote or unquote and unqoute-splicing calls, and sets counters for the nest levels of those. This function returns a tuple, which we destructive binding to q and u variables, that stand for quasiquote and unquote.

If q is equal to 0 this means that we’ve got unquote outside of quasiquote, which is not allowed. Otherwise, if we got the same levels we push expression to unquotes stack. But this is not the end, we still have to analyze inner expressions, because unquote may contain other calls for unquote or quasiquote and so on, so our algorithm repeats itself one more time and pushes expressions to stack depending on their types and levels.

This function produces vector of valid unquote and unquote-splicing calls that we pass to our eval function and evaluate.

Unquote procedure

To perform unquoting, we have to take into account several things. But first let’s look at Unquote arms in eval procedure:

Type::Unquote if quasiquote_level > 0 => {
    let rest = Self::rest_expressions(&expr)?;
    let args_to_eval = Self::pre_unquote(&rest)?;
    if !args_to_eval.is_empty() {
        unquote_level += 1;
        stack.extend_from_slice(&args_to_eval);
        continue;
    }
    unquote_level -= 1;
    let (res, pop) = Self::unquote(&rest)?;
    Tree::replace_tree(&expr, res);
    if pop {
        stack.pop()
    } else {
        continue;
    }
}
Type::Unquote => return Err(EvalError::UnquoteNotInQquote),

Similarly to qasiquote arm, we check for quasiquote_level, but now if it is greater than 0. If it’s 0 it’s an error that we have to catch at the top level.

You can see that there’s another pre_ procedure here. That is because we’re going to evaluate unquote argument, and then actually unquote it. pre_unquote just checks for valid forms that we actually need to evaluate, similarly to other pre_ procedures, so I don’t think we need to look closely at it. We’ve seen a lot of those in Part 2, and this one is no different.

After everything is evaluated, we call Self::unquote:

fn unquote(args: &[NodePtr]) -> Result<(NodePtr, bool), EvalError> {
    Self::check_argument_count("unquote", ArgAmount::NotEqual(1), args)?;

    match Self::expression_type(&args[0]) {
        Type::List | Type::Symbol | Type::Quasiquote => {
            Ok((Self::rest_expressions(&args[0])?[0].clone(), true))
        }
        _ => Ok((args[0].clone(), false)),
    }
}

This procedure is extremely similar to quote, except it removes quote from List, Symbol, and Quasiquote forms, and returns all other forms as is. It also indicates if we should pop our tree traversal stack or not.

Unquote-splicing procedure

This procedure is kinda the same as unquote and actually uses unquote underneath, but the big thing is splicing part here.

Because GRScheme uses trees of vectors, which are not linked lists, I thought that maybe there should not be such a thing as splicing, but in the end, I’ve decided that it’ll be better with it.

Splicing works kinda like you would expect, except that it modifies the vector:

impl Tree<T> {
    // …
    pub fn splice_in_childs(root: &NodePtr<T>, pos: usize, tree: NodePtr<T>) -> &NodePtr<T> {
        let mut broot = root.borrow_mut();
        let pos = if pos > broot.siblings.len() {
            broot.siblings.len() - 1
        } else {
            pos
        };

        for child in tree.borrow().siblings.iter().rev() {
            broot.siblings.insert(pos, child.clone());
        }

        root
    }
    // …
}

Not very idiomatic to do such a thing with vectors, but, eh, I can live with that. This gives us the ability to splice in siblings of one tree into another one at a given place. If a place is more than the size of the current tree’s siblings, we just put those to the end. Now we can use it in UnquoteSplicing arm:

Type::UnquoteSplicing if quasiquote_level > 0 => {
    let parent = Tree::parent(&expr).unwrap();
    if !Self::splicing_context_valid(&expr) {
        return Err(EvalError::UnquoteSplicingInWrongContext)
    }
    let pos = Self::splicing_pos(&parent).unwrap();
    let rest = Self::rest_expressions(&expr)?;
    let args_to_eval = Self::pre_unquote(&rest)?;
    if !args_to_eval.is_empty() {
        unquote_level += 1;
        stack.extend_from_slice(&args_to_eval);
        continue;
    }
    unquote_level -= 1;
    let (res, pop) = Self::unquote(&rest)?;
    Tree::remove_child(&parent, pos);
    Tree::splice_in_childs(&parent, pos, res);
    if pop {
        stack.pop()
    } else {
        continue;
    }
}
Type::UnquoteSplicing => return Err(EvalError::UnquoteNotInQquote),

For the most part, it is the same as the unquote procedure, except that at the end we splice results in, and also we check that we’re inside the valid context. This check finds the quasiquote procedure and checks if it has an argument of Procedure:

`(,@'(1 2 3)) ;; valid
`,@'(1 2 3)   ;; invalid

As a little showcase of how everything works so far, let’s implement the thread first macro from Clojure ->:

grscheme> (define -> (lambda forms
            (define loop (lambda (x forms)
              (if (empty? forms)
                  x
                  (let (form (car forms))
                    (if (list? form)
                        (loop `(,(car form) ,x ,@(cdr form)) (cdr forms))
                        (loop `(,form ,x) (cdr forms)))))))
            (eval (loop (car forms) (cdr forms)))))
grscheme> ->
#procedure:->
grscheme> (-> "Thread first!\n"
              '(display))
Thread first!
grscheme> (-> ''(1 2 3)
              '(cdr)
              '(car)
              '(cons '(3 4 5)))
(2 3 4 5)

Of course, this is not a real macro, and everything happens in run time every time we call ->, but the algorithm is the same as in Clojure’s ->.

GRScheme complete?

Certainly not. When I started working on it, I knew literally nothing about Scheme, Lisp, and the basic ideas behind its implementations. I was using different Lisps to some extent, like Emacs Lisp for writing small functions for my Emacs config, and going through SICP with Guile, but up until I wrote my own evaluator I had no complete understanding of what I’m doing. Since then I’ve read some papers on Lisp design, got into Clojure, and progressed through SICP, so I know a little more on how things should work internally for the most part. And with this knowledge…

I understand that my implementation of the whole thing is bad, and I mean really bad :D

There are still no macros, internal data representation is really ugly, strings are useless, there’s no file IO, REPL is really basic, there’s no debugging, the stack doesn’t exist, and so on. I don’t think I should really turn GRScheme into complete language, at least right now.

Even if I wanted to create a real practical language, that could compete with other Schemes, Clojure, or Common Lisp, I would have to redesign everything from the bottom up, so probably it will be easier to write a completely new thing. And that will still mean that I’ll have to re-implement every feature that those languages provide, which is an insane amount of work.

Although I would say that I think Rust is a great choice as the implementation language because it helps produce memory-safe code, and probably will make the implementation of multi-threading less worrying, compared to C. Initially, I had planned on supporting execution in separate threads, but I’ve dropped this idea because the current tree traversal algorithm can’t be easily converted to use multiple threads.

This mostly concludes my experiment with the development of Scheme-like language. I hope it was at least an interesting read. Thanks!