# Lazy sequences and iterators

@programming lua laziness fennel lisp ~21 minutes read

Today’s topic will be about lazy sequences and how these are different from iterators. I’ve wanted to make an article on this topic for some time, but unfortunately, there was no good way to show the differences using a single language (that I know), because usually, languages stick to one of those things. But since I’ve been into library writing lately, I wrote a lazy sequence library for the Fennel language, and because Fennel compiles to Lua, it makes Lua a great candidate for this article.

Lua has first-class support for iterators, as it is kind of a central paradigm very closely related to tables. And because iterators are so common in Lua, some libraries even treat them as sequences. However, iterators are not sequences, even though they can be used similarly. And proper sequences also can be created from tables much like iterators and would have similar use cases, without some problems that iterators have. So today we’ll look into Lua iterators, and lazy sequences, and will try to understand their differences.

## Iterators in Lua

Let’s begin by understanding what iterators are, and what key features they provide.

In Lua, iterators are usually higher-order functions, produced by another function, based on its input. A standard way to get an iterator over a table is to call `pairs` function:

``````get_next, t, k = pairs({a = 1, b = 2, c = 3})
``````

Lua supports multiple return values, so calling pairs returns us the iterator function as the first value, a table (our original table in this case), that we’re going to iterate as the second value, and the starting key, which in our case is `nil`.

Calling this `get_next` function, and passing it our table `t`, and the key `k`, we get back a pair of values, representing the key and a value associated with that key:

``````get_next(t, k) -- b	2
``````

We then can use the returned key as an input to the next call to `get_next` and get the next key-value pair, repeating the process until `get_next` returns `nil`:

``````get_next(t, "b") -- c	3
get_next(t, "c") -- a	1
get_next(t, "a") -- nil
``````

`nil` means our table is exhausted, and no more keys are present in it. The order may be different for you, as it depends on hashing.

This `get_next` is a stateless iterator - it doesn’t have any hidden state inside it, which means, that it can be reused. For example, we can pass a different table to the `get_next` function, and it will work on it without any problem:

``````get_next({f = 42, g = 27}, nil) -- g	27
``````

This pattern is quite common, so Lua has such a function in its standard library, it is called `next` and is a basis for all table iterators created with `pairs`. In addition to `pairs`, Lua provides another iterator constructor called `ipairs`, which is used to iterate over sequential tables, and it works similarly to `pairs` except uses integers as input keys.

While stateless iterators are good, it’s not always efficient, and sometimes not even possible to create a stateless iterator. For example, opening a file in Lua and calling `lines` method on it creates an iterator over this file, which is a stateful iterator that can be exhausted, i.e. you can call it only so many times as there are lines in the file. Let’s create a file `example.txt` with the following contents, and create an iterator over it:

``````line 1
line 2
line 3
``````
``````file = io.open("example.txt", "r")
get_next_line = file:lines()
get_next_line() -- line 1
get_next_line() -- line 2
get_next_line() -- line 3
get_next_line() -- nil
``````

Note that even though we’re calling the `get_next_line` function without any arguments, like the file, or the line number, it always returns the next line of the file. This is an example of a stateful iterator, which we can only use once - even if we call the `lines` method on the file object again, the returned iterator will be exhausted:

``````new_get_next_line = file:lines()
new_get_next_line() -- nil
``````

This happens because the state is held in the file handle itself, so we need to reopen the file to get a new iterator. Because of this you usually want to open the file, walk it, use or cache needed lines, and close the resource, ideally doing all of this in a protected call.

And lastly, let’s create our own stateful iterator, that keeps its state in the closure. For the purposes of this example, let’s create a `range` iterator, that can create various ranges based on initial input:

``````function range(lower, upper, step)
local lower, upper, step = lower or 0, upper or math.huge, step or 1
return function ()
local tmp = lower
lower = lower + step
if tmp < upper then
return tmp
end
end
end
``````

To keep it simple I’ve left some corner cases out, but this is basically the idea for a range iterator. When we call `range`, local variables `lower`, `upper`, and `step` are created based on given arguments or their default values. Then an anonymous function is returned, which refers to these local variables as closures, and sets the `lower` variable to a new value each time it is called:

``````r = range(1, 4)
r() -- 1
r() -- 2
r() -- 3
r() -- nil
``````

As you can see, when we exhaust our iterator it returns `nil`, similarly to iterators created with `pairs` or file iterators. Thanks to this, we can use it in Lua’s `for` statement:

``````for x in range(1, 10, 3) do
print(x)
end
-- 1
-- 4
-- 7
``````

So now, when we more or less know what iterators are, and how they’re defined, let’s discuss their key properties:

1. Iterators are lazy,
2. Stateless iterators accept some kind of state (usually an object and a key) as arguments to produce the next key-value pair, and hence can be reused,
3. Stateful iterators are like cursors, and can’t be reused unless created anew.

Laziness is a very good property of iterators - it allows us to create actual infinite streams of data, computed on demand. For example, we can open really huge, or even infinite files, and read those line by line without consuming more memory than is actually needed to hold a single line. Alternatively, as was shown with the `range` iterator above, we can create an infinite range, by not supplying an upper border for it, and it will use the `math.huge` value, which is `inf` in Lua. So if Lua had arbitrary precision math, the range would grow to infinity, but in the case of this function, it will just overflow again and again.

However, while laziness is an extremely handy and important part of iterators, their benefits pretty much end on it. Iterator is basically a one-shot thing - you create it, use it until it is exhausted, or you’ve accomplished what you wanted to do with them, and then you never use the same iterator again. This is due to the fact, that iterators were not meant to be shared - you can’t tell if an iterator is stateless or stateful, and if it is a stateful iterator, there’s no safe way to share it. Someone can exhaust it, and later users will not get any values. Stateless iterators are less dangerous to share, but the usefulness of this is also significantly lower, as you need to pass the context along with an iterator.

All of this can be summarized to one point - iterators are not a data structure. And while this is not a bad thing, the usefulness of the iterator is limited because of this. So what if we wanted to share an iterator, and also make sure, that it doesn’t have problems with stateful iterators, and can be used as a data structure? That’s where lazy sequences come in handy!

## Lazy Sequences

What are sequences? Think of a lazy sequence as a singly-linked list, with the difference that the tail may not be an actual cell, but a special kind of lazily evaluated cell. In order to create a sequence, we need to implement a so-called `cons` cell:

``````function cons (head, tail)
return function (ht)
if ht then
else
return tail
end
end
end
``````

Conses are an old term that comes from the LISP language and basically represents a pair. It’s a data structure that holds two elements - a value and a pointer to the next value or a cons cell. This is usually illustrated with these boxes:

These stacked boxes represent a single cons cell - a pair of boxes, each referring to a value. The first box refers to `1`, and the second box refers to another cons cell, which in turn refers to `2`, and finally to nothing, indicating the end of the list. This data structure suits our task quite well, but you may notice that our implementation isn’t exactly a data structure, but a function. So how do we work with it then?

We need two primitive operations. In LISP these are historically called `car` and `cdr`, but it’s not meaningful to use these names in the context of Lua, so let’s call them `first` and `rest`:

``````function first (cell)
return cell(true)
end

function rest (cell)
return cell(false)
end
``````

Since our cons cell is a function of one argument with `if` statement in it, `first` will pass `true` to it, to get the first element, and `rest` will pass `false` to get the other element of a cons cell. Here’s a quick demonstration:

``````list = cons(1, cons(2, cons(3, nil))) -- 1 -> 2 -> 3 -> nil
first(list) -- 1
first(rest(rest(list))) -- 3
``````

We’ve created a `list` variable, that holds a single-linked list with values `1`, `2`, `3`. And the final value is `nil`, which is needed to indicate the end of the list since the last cons cell must refer to nothing. However, this list is not lazy just yet.

Let’s recap how iterators achieve laziness. Iterators produce the next value only when you call them, which makes them very similar to generators by nature. However, the key difference from lazy sequences is that all previous results are thrown away and not kept with the iterator.

So how we can achieve laziness in our sequences? We can try a naive approach of wrapping arguments into functions, and realizing them when a value is requested:

``````function lazy_cons (headf, tailf)
return function (ht)
if ht then
else
return tailf()
end
end
end
``````

Now, when constructing a list, instead of passing the elements directly, we need to pass some functions that will return the list’s elements:

``````lazy_list = lazy_cons(
function () print "realize head" return 1 end,
function () print "realize tail" return 2 end
)
``````

Because the interface is the same, we can use `first` and `rest` to get values:

``````first(lazy_list)
-- 1
rest(lazy_list)
-- realize tail
-- 2
``````

You can see, that when we’re calling `first` on our `lazy_list` it realizes the value, and then returns it. This is very close to our goal, but we’re not there yet. The problem is, that if we call `first` on `lazy_list` again, it will call the realization function again and again without caching the result, which is not right:

``````first(lazy_list)
-- 1
first(lazy_list)
-- 1
``````

To solve this issue, we’ll need to rewrite `lazy_cons` in a such way, that it will realize our cell only once, and cache the result somehow. This can be done by creating mutable closures, but I think there’s a better way, that will also allow us to do more interesting stuff later. And it is also kinda hard to differentiate if the closure is uninitialized or the realization set it to `nil`. So instead we can use a clever metatable trick to realize cons cell contents upon access:

``````function lazy_cons (headf, tailf)
local cell = {}
local function realize ()
local function realized (_, ht)
if ht then
else
return tail
end
end
return setmetatable(cell, {__call = realized})
end
return setmetatable(cell, {__call = function (_, ht) return realize()(ht) end})
end
``````

Basically, we create a table `cell`, to which we set a metamethod `__call` that works similarly to the function that is returned by `cons`, except not quite. Note the `realize()(ht)` call in this metamethod - it calls the `realize` function first, then invokes the resulting function. The `realize` function is defined above, and this function resets the `__call` metamethod to another function, that uses cached values. Now using `first` on a such cell will work as expected:

``````lazy_list = lazy_cons(
function () print "realize head" return 1 end,
function () print "realize tail" return 2 end
)

first(lazy_list)
-- realize tail
-- 1
first(lazy_list)
-- 1
rest(lazy_list)
-- 2
``````

It realizes the cell and then caches the results in a closure, so when we call `first` on this cell later, we get the same values every time. Now we can create and manipulate lazy sequences with only these three basic functions! Here’s our `list` example from above, but now lazy:

``````lazy_list = lazy_cons(
function () print "first" return 1 end,
function ()
return lazy_cons(
function () print "second" return 2 end,
function ()
return lazy_cons (
function () print "third" return 3 end,
function () return nil end
)
end
)
end
)
``````

And we can inspect it with the same `first` and `rest` functions:

``````first(lazy_list)
-- first
-- 1
first(rest(rest(lazy_list)))
-- second
-- third
-- 3
first(lazy_list)
-- 1
``````

That’s pretty much it! We have a way of constructing lazy sequences via a very primitive `lazy_cons` function, and anonymous functions as arguments. The usefulness of this may not be obvious, so let’s see some more examples of how this can be utilized before we move on.

### More examples

Let’s create a lazy sequence that works similarly to the `range` iterator we did earlier:

``````function range_seq(lower, upper, step)
local lower, upper, step = lower or 0, upper or math.huge, step or 1
return lazy_cons(
function () return lower end,
function ()
local tmp = lower + step
if tmp < upper then
return range_seq(tmp, upper, step)
else
return cons(nil, nil)
end
end
)
end
``````

A bit more verbose, but notice that this is no longer a stateful thing! Instead, it is a recursive function, which returns a self-referencing cons cell, which calls itself with changed arguments.

A recursion? How’s it lazy then?

The thing is, that it’s not even a recursion! The next recursion step will only happen when a cell is realized, and it will simply produce another lazy cell, which will not go to the next recursion step immediately. And even if Lua had no tail call optimization it would still work just fine, as it acts more like a trampoline, and the call stack never actually grows. Here’s how we can see that our `range_seq` is working:

``````r = range_seq(1, 4)
first(rest(rest(r)))
-- 3
first(rest(rest(rest(r))))
-- nil
``````

And for infinite ranges we can drop as many values before working with the range:

``````r = range_seq()
for i=1,1000000 do
-- let's drop some elements
r = rest(r)
end
first(r) -- 1000000
first(rest(r)) -- 1000001
``````

It’s a quite common pattern when working with infinite sequences to drop some values before working with them. So if we want to create an infinite range, that starts from a given value, we can write a `drop` function, which will produce a new lazy sequence without first `n` elements in it.

Another thing to note is that it is a bit tedious to write nested `rest` calls every time, so let’s iterate over this range instead, with the power of recursion:

``````function loop (seq, f)
local head, tail = first(seq), rest(seq)
if tail then
loop(tail, f)
end
end

loop(range_seq(0, 10, 2), print)
-- 0
-- 2
-- 4
-- 6
-- 8
``````

This `loop` function accepts a sequence and calls a supplied `f` function on each element for side effects. A very similar concept exists in most functional languages, and it is called mapping, so let’s create a lazy map function:

``````function map (seq, f)
if first(seq) then
return lazy_cons(
function () return f(first(seq)) end,
function () return map(rest(seq), f) end
)
else
return cons(nil, nil)
end
end
``````

Now with this `map` function, we can not only apply a function to every element of the sequence, but we can also collect the results into another sequence. Let’s write a function that prints every element of the given sequence, and pass the result of the `map` call to it:

``````function print_seq (seq)
io.stdout:write("[")
local function print_seq (seq)
if first(seq) then
io.stdout:write(first(seq), ", ")
print_seq(rest(seq))
end
end
print_seq(seq)
io.stdout:write("\b\b]\n")
end

function square (x) return x * x end

print_seq(map(range_seq(0, 10), square))
-- [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
``````

Now, to demonstrate that this is really a lazy `map`, let’s add a `print` call to `square`, and store the resulting sequence to a variable:

``````function square (x)
print("square of "..x.." is "..x * x)
return x * x
end

squares = map(range_seq(0, 10), square)
``````

Nothing is printed until we realize some elements of our sequence:

``````first(squares)
-- prints: square of 0 is 0
first(rest(rest(squares)))
-- prints: square of 2 is 4
first(rest(rest(squares)))
-- prints nothing, as the result was already realized
``````

Note that while this is still just a (kinda special) function, it behaves just like a real linked list, and thus represents a proper data structure. Although it’s a bit clumsy to use directly, as shown with the `map` example, we can build a library around this concept, that will abstract the construction of a lazy sequence away from the user. And that’s exactly what I did.

## Lazy-seq library

It should be pointed out that the code above is a very rough example of a lazy sequence implementation. I think it’s fine for explaining the idea, but there are some edge cases that are not handled properly. For example, the `map` function we’ve written is not fully lazy, as it actually realizes its first element, and the `range_seq` will not properly work for negative ranges. And the `lazy_cons` is not really the most useful abstraction - instead of using it directly, a much more handy `lazy_seq` function should be created, that would accept another function, which in turn would return any kind of sequence. To address these shortcomings I’ve created a lazy-seq library, which has a lot of functions to create and manipulate lazy sequences.

Now, it’s a Fennel library, written in Fennel, and aimed toward it first. That is, mainly due to the fact that this library mimics Clojure’s vision of lazy sequences and ports a lot of functions from its core namespace. But the core principles are exactly the same as described above, and you still can use the library from Lua, just like we did earlier!

Except the semantics are expanded a bit - there are handy constructors for creating lazy sequences from tables or strings, supporting both sequential and associative tables. And, as you may remember from the previous section since our cons cells are secretly tables underneath, sequences defined by this library also implement various metamethods, like `__len`, `__index`, and so on, for more convenient usage.

Proper versions of `map` and `range` already exist in this library, along with other functions, which can be used as building blocks for creating all kinds of sequences. For example, here’s how one would create an infinite Fibonacci sequence using the lazy variant of the `map` function in Lua:

``````lazy = require "lazy-seq"
first, rest, map = lazy.first, lazy.rest, lazy.map
concat, drop, lazy_seq = lazy.concat, lazy.drop, lazy["lazy-seq"]
bc = require "bc"
function add (a, b) return a + b end
fib = concat(
{bc.new(0), bc.new(1)},
lazy_seq(function () return map(add, rest(fib), fib) end)
)
first(drop(400, fib))
-- 176023680645013966468226945392411250770384383304492191886725992896575345044216019675
``````

In the example above I’m also using the `lbc` library which provides arbitrary precision math for Lua to illustrate that you can compute as many Fibonacci numbers as you want (try dropping the first 100000 elements for example) That’s a bit verbose still, but not as verbose as the manually created lazy linked list from previous sections. And note that the `concat` function, which is used to concatenate sequences, also works with tables, as shown in the example above. All sequence manipulating functions in the `lazy-seq` library work with tables, as tables are implicitly converted to sequences.

In Fennel, the same code is even more compact because we can use the `lazy-cat` macro instead of using `concat` and `lazy-seq`, and thanks to anonymous function shorthand:

``````(local {: first : rest : map : drop} (require "lazy-seq"))
(import-macros {: lazy-cat} "lazy-seq")
(local bc (require "bc"))
(global fib (lazy-cat [(bc.new 0) (bc.new 1)] (map #(+ \$1 \$2) (rest fib) fib)))
(first (drop 400 fib))
;; 176023680645013966468226945392411250770384383304492191886725992896575345044216019675
``````

Note however that macros are a Fennel-only feature, not available in Lua, so where in Fennel you would write:

``````(import-macros {: lazy-cat : lazy-seq} "lazy-seq")

(lazy-cat [1 2 3] [4 5 6])
(lazy-seq (do (print "I'm lazy") [1 2 3]))
``````

In Lua it would be:

``````lazy = require "lazy-seq"
concat, lazy_seq = lazy.concat, lazy["lazy-seq"]

concat(lazy_seq(function () return {1, 2, 3} end), lazy_seq(function () return {4, 5, 6} end))
lazy_seq(function () print "I'm lazy" {1, 2, 3} end)
``````

This code is fully equivalent to fennel code, except that macros write all these anonymous functions for you.

But macros are not the only unique feature in Fennel, the other one is destructuring. I’ll not go into details here, but destructuring is kinda similar to pattern matching, and it allows specifying data structure shape and binding values of the data structure to variables defined by that shape. So similarly to how you would destructure a table, thanks to `__len` and `__index` metamethods, destructuring works on lazy sequences:

``````(local {: range} (require "lazy-seq"))
(let [[a b c & rest] (range 10)] (print a b c) rest)
;; 0 1 2
;; [3 4 5 6 7 8 9]
``````

Just make sure you don’t use `&` destructuring with infinite sequences1. And more than that, sequences define the `__pairs` metamethod, and thus support creating stateless iterators created with `pairs`!

``````fennel = require "fennel" -- for pretty printer
lazy = require "lazy-seq"
squares = lazy.map(function (x) return x * x end, {1, 2, 3, 4, 5})
for k,v in pairs(squares) do
print(fennel.view(k), v)
end
-- @seq(4 9 16 25) 1
-- @seq(9 16 25)   4
-- @seq(16 25)     9
-- @seq(25)        16
-- @seq()          25
``````

We’ve come a full circle.

I’ve required the `fennel` library, to provide the pretty-printed representation of a sequence for a better illustration. You can see that on each step of iteration we see the shorter and shorter sequence. Sequence automatically pretty-printed in the REPL like this, but that’s one of Fennel’s features, that I’ve previously written about, and implemented a very similar linked list there, so if you’re interested in that, you can find some more info there. And viewing such a sequence in this way really shows us that it is nothing else but a data structure!

Unlike tables, a stateless iterator over a sequence uses the tail of the sequence as the key to getting the next element, because the iterator is simply built on top of `first` and `rest`. It ends when it reaches the empty cons cell, which is at the end of each sequence. The only exceptions are infinite sequences, which don’t have empty cons at the end as, well, they’re infinite. So calling `pairs` and iterating over infinite sequences should have another guard for terminating iteration. Or, you can limit the size of the sequence with `take`, `take-while`, and other filtering functions.

There are also various functions like `line-seq` which accepts a file handle, so here’s what happens when we call it on a file:

``````(local {: line-seq : doall} (require "lazy-seq"))
(with-open [f (io.open "example.txt" :r)] (doall (line-seq f)))
;; @seq("line 1" "line 2" "line 3")
``````

We have to realize the whole sequence before `with-open` automatically closes the file handle, but you can lazily work with this sequence while you’re inside of the `with-open` block without any issues. In contrary to the iterator, we can re-use this sequence of lines, as no results have been thrown away.

Finally, as I’ve mentioned, lazy sequences can be generated from tables with the `seq` function:

``````(local {: seq} (require "lazy-seq"))
(seq [1 2 3])
;; @seq(1 2 3)
(seq "abc")
;; @seq("a" "b" "c")
``````

If we pass in an associative table we get a bit different representation:

``````(seq {:a 1 :b 2 :c 3})
;; @seq(["a" 1] ["c" 3] ["b" 2])
``````

These sequences are all lazy because `seq` creates a wrapper around an iterator over the table. This means, that if you change the table before the sequence is fully realized, your resulting sequence will have updated elements much like an iterator would produce them. This is not recommended, and instead immutable tables a better be used to produce sequences, and in general. And since sequences themselves are immutable and persistent this is a win-win.

The `lazy-seq` library also provides a lot of other functions, like filtering, restructuring, modifying, and combining infinite sequences. It’ll take a lot of time to list all of them here, so I’d recommend you to read the documentation if you’re interested. But as an example of what you can do, here we create an infinite sequence of random numbers, filter out only those which are perfect squares, and group those by 2:

``````(local {: take : filter : repeatedly : partition} (require "lazy-seq"))
(local rands (repeatedly math.random 0 64))
(local perfect-squares (filter #(= 0 (% (math.sqrt \$) 2)) rands))
(local rand-square-pairs (partition 2 perfect-squares))
(take 10 rand-square-pairs)
;; @seq(@seq(64 16)
;;      @seq(4 0)
;;      @seq(64 4)
;;      @seq(4 0)
;;      @seq(0 4)
;;      @seq(4 64)
;;      @seq(16 0)
;;      @seq(0 64)
;;      @seq(16 4)
;;      @seq(16 36))
``````

Note that we did all operations on an infinite sequence, and then only used `take` to get the first 10 elements of it. This is a common pattern when working with infinite sequences - create something infinite, filter or modify it, and take the needed amount of elements.

## Conclusion

That’s all I wanted to say about lazy sequences and their relationship with iterators. But of course, there’s much more to it, as there are a lot more functions for sequence manipulation I’ve left uncovered because it would simply be too long for an article. The main point I’ve wanted to share is that sequences are similar to iterators, but don’t have some downsides that iterators have. Someone can argue that these downsides are features, but I think sequences still have their uses.

So let’s recap what we have learned about sequences:

• Sequences are a data structure, with very important properties of persistence and immutability.
• Sequences are lazy, much like iterators, but they cache their values for later re-use.
• Things that required stateful iterators, can be done with stateless sequences by using laziness and recursion.
• Sequences, implemented by `lazy-seq` library, support common table operations, making it easy to integrate lazy computation into existing Lua code.
• Operations on infinite sequences can be easily composed.

I hope you got a good understanding of lazy sequences, and I got you interested in at least trying them out! Despite being a Fennel library, lazy-seq can be used from Lua just fine, you only need to compile it to Lua first, which is not hard to do, given that Fennel itself is also just a library. But I suggest using this library with Fennel, as it adds a bit more facilities, like pretty-printing of your data structures, and provides macros for a more concise code.

1. : As of Fennel v1.0.0 the `__fennelrest` metamethod was added, and `&` destructuring works on infinite sequences. ↩︎