Andrey Listopadov

Understanding transducers

@programming clojure ~26 minutes read

Some time ago I ported most of Clojure’s core namespace to Fennel and made it into a library called fennel-cljlib. This was my first library for Fennel, so it wasn’t really great in terms of how it was implemented. While it was making Fennel more like Clojure syntax-wise, which I like, it wasn’t following Clojure’s semantics that well. The main thing that was missing is a proper seq abstraction, which Clojure relies on for providing both lazy sequences and generic functions that can work on any data type that implements ISeq. Such functions are map, filter, take and so on.

Since then, I’ve made a few more libraries for Fennel, which were somewhat more narrowly focused, and one of such libraries was lazy-seq - an implementation of Clojure’s lazy sequences. It doesn’t feature chunked sequences, (yet, maybe), but it implements almost all sequence-related functions from Clojure. And you can throw pretty much any Lua data structure that implements pairs or ipairs into it, and it will work out how to lazily transform it into a sequence.

This was one of the missing pieces for the fennel-cljlib, as its implementation of seq simply made a shallow copy of a given object in a linear time, making sure that the result is sequential. With the implementation of seq from the lazy-seq library, I could rewrite fennel-cljlib, also making all sequence-related functions lazy. And while this will make the library more Clojure-like one piece is still missing from both fennel-cljlib and lazy-seq libraries.

Transducers.

I’m quite familiar with transducers, well, I use them regularly at work, and I read about their implementation a few years ago. However, I’ve never implemented a transduceable context, i.e. a function that accepts a transducer, and applies it to elements of a given collection. So, as a part of the rewrite of the fennel-cljlib library, I needed not only to port transducers themselves, but I also had to implement such functions as into, transduce, and sequence, which are transduceable contexts.

Thankfully, into and transduce are written in Clojure, and are very straightforward to understand, but the sequence function is not. Here’s its source code:

(defn sequence
  ([coll]
   (if (seq? coll) coll
       (or (seq coll) ())))
  ([xform coll]
     (or (clojure.lang.RT/chunkIteratorSeq
         (clojure.lang.TransformerIterator/create xform (clojure.lang.RT/iter coll)))
       ()))
  ([xform coll & colls]
     (or (clojure.lang.RT/chunkIteratorSeq
         (clojure.lang.TransformerIterator/createMulti
           xform
           (map #(clojure.lang.RT/iter %) (cons coll colls))))
       ())))

It is written mostly via Java interop, and I can’t use this in my port of the clojure.core namespace to Fennel, because Fennel runs on Lua, and Lua can’t really interop with Java. So I had to reimplement this function in Fennel, and this is what motivated me to write this post. Also, I don’t really know Java, so understanding how sequence works was a challenge on its own1.

The interesting thing is, after I’ve tried to implement this function several times, I understood that I, actually, don’t understand transducers as well as I thought. So I had to go back a bit and learn how transducers actually work, and why are they written the way they are. It was really fascinating, and after a bit of trial and error, I managed to implement sequence in Clojure first, and then port it to Fennel, using my implementation of lazy sequences.

I will show the resulting code later in this post, but first, let’s understand what transducers are, and how they work.

Transducers

First, a bit of theory. A transducer is a function that describes the process of transformation, without knowing how exactly the thing it transforms is organized. It is not the same as generic functions, because transducers are generic in a bit different way.

For example, we all know how map works in Clojure - you pass it a function and a collection, and it applies the function to each element of the collection:

(map inc [1 2 3]) ;; => (2 3 4)

Map walks through the given collection, applies the given function to each element, and puts results into a sequence. Seems nice, and Clojure actually makes map generic by transforming the given collection to a sequence before actually mapping over it. However, this approach isn’t generic enough, because there are things that can’t be transformed into a sequence in an efficient or even meaningful way. One such thing is an asynchronous channel, and when Clojure developers were working on the core.async library, they’ve realized that sequence manipulation functions are usable in the context of a channel, but reimplementing all these functions is a no-go.

So how did Clojure developers solve this problem? By decoupling the transformation process from the collection it transforms. Here’s a helpful analogy.

Imagine an airport worker, whose job is to weigh and sort the luggage before it goes into a plane. Their basic instructions are:

  • Take a bag;
  • Measure its weight and put a sticker on the bag;
  • If the bag weight is bigger than X, don’t put the bag on the plane;
  • Hand the bag over.

Note that this process, while can be applied to a single bag at a time, doesn’t at all specify how bags are coming to you and how they leave you. One day bags can come to you in containers brought by a vehicle, the other day they can come on a conveyor, it doesn’t matter to you - you just take a bag, weigh it, put a sticker and hand it over to another car or another conveyor. These details should not matter, because your job remains the same, even if you take bags from different sources every day.

This is, of course, an analogy, but it applies to programming pretty well. Bags are items in the collection you’re going to map a function over and then filter them out. The function is what you do with the bag, in the case of our fellow airport worker weighing bags and putting stickers. In addition, notice that we first weigh the bag, and then filter it out immediately, in oppose to weighing all bags and then filtering them one by one.

However, in a programming language, the way how items are coming to us completely depends on the collection. And how we collect results into another collection depends on the implementation of the map function. These are two main things stopping us from describing an arbitrary transformation process, without tying ourselves to a particular collection implementation or a conversion protocol.

Looking at other languages, which provide different classes for different data structures, most of the time map is a method and not a function. This way map can be implemented in terms of the collection you’re mapping through, usually producing the same collection as a result, because this method knows how to map a function over this particular collection implementation.

Methods do not fly in functional languages, so another approach, some languages take is to provide different implementations of map functions via namespaces. For example, Elixir has a map function implemented for lists and another implementation for streams:

# Enum versions of map and filter
iex(1)> (0..3
         |> Enum.map(fn(x) -> IO.puts("map #{x}"); x + 1 end)
         |> Enum.filter(fn(x) -> IO.puts("filter #{x}"); rem(x, 2) != 0 end)
         |> List.first)
map 0
map 1
map 2
map 3
filter 1
filter 2
filter 3
filter 4
1

# Stream versions of map and filter
iex(2)> (0..3
         |> Stream.map(fn(x) -> IO.puts("map #{x}"); x + 1 end)
         |> Stream.filter(fn(x) -> IO.puts("filter #{x}"); rem(x, 2) != 0 end)
         |> Enum.to_list
         |> List.first)
map 0
filter 1
map 1
filter 2
map 2
filter 3
map 3
filter 4
1
Code Snippet 1: Elixir approach to the problem

The difference is quite substantial, as streams apply function composition to each element one by one, (similarly to how our airport worker does) whereas enumerations are fully traversed by map first and only then the result is being passed to filter. This distinction is possible thanks to different implementations of map, but be it a method of a specific class, or a namespaced function that’s exactly what Clojure developers wanted to avoid. So transducers were created.

Understanding transducers

To understand transducers, we first need to understand what map essentially does and how we can abstract it away from both the input collection and the result it produces. It may seem obvious: map applies a function to each element of a collection and puts the result into a new collection. I’ve marked important things in bold because if we think about these a bit, we’ll see that there are some concrete actions that the map function performs, which should be abstracted away.

First, let’s implement map in terms of reduce. This function is actually very similar to how mapv is implemented in Clojure, except we’ve left out some optimizations:

(defn mapr
  "A version of `map` that uses `reduce` to traverse the collection and
  build the result."
  [f coll]
  (reduce (fn reducer [result val] (conj result (f val))) [] coll))
;; ^       ^                        ^            ^        ^  ^
;; |       |                        |            |        |  |
;; |       |                        |            |        |  `Collection to iterate
;; |       |                        |            |        `Collection to put results
;; |       |                        |            `Function that produces the result
;; |       `A reducing function that`knows how to put elements to result
;; `This is how we get a collection element one by one

Here, reduce takes care of how to traverse the collection, and the reducer function takes care of how to put elements to the result. It may seem that we’ve decoupled these steps from map but we’ve just moved them into another place. More than that, if we were to implement filter this way we would have to put the logic, that decides what elements are going to be left out, into an anonymous function:

(defn filterr
  "A version of `filter` that uses `reduce` to traverse the collection
  and build the result."
  [f coll]
  (reduce (fn [res x] (if (f x) (conj res x) res)) [] coll))
;; ^                   ^
;; |                   |
;; |                   `Logic that decides whether the value will be left out
;; `generic way to iterate through a collection

Notice, that both mapr and filterr share a lot of structure, the only difference here is how to put the resulting value into the collection. This should give us a hint on how we can abstract this away. And given that Clojure is functional, we can write functions that accept other functions and return new functions, which will provide a generic way of how to put the result into a collection:

(defn map-transducer [f]
  (fn [reducer]
    (fn [result value]
      (reducer result (f value)))))

(defn filter-transducer [pred]
  (fn [reducer]
    (fn [result value]
      (if (pred value)
        (reducer result value)
        result))))

This isn’t what a transducer really is like, but a first real step towards them. The key point here is that now, we can describe a reducing process without knowing how to put the modified item into the resulting collection (or channel, or socket, or whatever). The only thing we need to implement for collection now is reduce or some other function that has the same interface as reduce. This is how we can use these prototype transducers:

user> (def incrementer (map-transducer inc)) ; a function that knows how to increment
#'user/incrementer
user> (incrementer conj) ; teaching `incrementer` how to put elements to the result
#function[user/map-transducer/fn--5698/fn--5699]
user> (reduce (incrementer conj) [] [1 2 3]) ; using this transducer in `reduce`
[2 3 4]
user> (reduce ((filter-transducer odd?) conj) [] [1 2 3]) ; same for `filter-transducer`
[1 3]

So what happens here is that when we call map-transducer and pass it a function inc it returns a function, that accepts the reducer also known as the reducing function of just rf for short. We then call this function, passing it the reducing function conj and get another function, that accepts the results so far, and the element to process. This function then calls inc, which we’ve supplied in the first step, on the element, and uses conj to put the resulting value to result. In other words, by passing inc and conj to the transducer we’ve basically constructed (fn [result value] (conj res (inc value))) function, that is then used by reduce. Here’s a demonstration:

user> (reduce ((map-transducer inc) conj) [] [1 2 3])
[2 3 4]
user> (reduce (fn [result value] (conj result (inc value))) [] [1 2 3])
[2 3 4]

And that’s basically what transducers are all about! They’re just a composition of functions, that produces the final transformation function, that acts as a single step over the given collection. And the amazing part of such design is that transducers can be composed with other transducers:

user> (reduce ((comp (map-transducer inc)
                     (filter-transducer odd?))
               conj)
              [] [1 2 3 4 5 6])
[3 5 7]
user> (reduce (fn [result value]       ; above is essentially the same as this
                ((fn [result value]    ; function composition
                   (if (odd? value)
                     (conj result value)
                     result))
                 result (inc value)))
              [] [1 2 3 4 5 6])
[3 5 7]

It may be a little hard to process, but don’t worry, I will go into details after we complete the implementation of transducers, as we’re not yet finished.

Completing transducers

Most transducers don’t have any intermediate state, but some do. For example, the partition-all function takes a collection and returns a list of partitions of elements from this collection. The transducer, which is returned by this function, needs to store elements inside an array, and only after the current partition is filled it will append it to the result. Seems logical, however, if the number of elements can’t be equally partitioned, some will be left over:

user> (partition-all 3 (range 8)) ; a regular partition-all call
((0 1 2) (3 4 5) (6 7))
user> (defn partition-all-transducer
        "Our naive implementation of `partition-all` as a transducer."
        [n]
        (fn [reducing-function]
          (let [p (volatile! [])]
            (fn [result value]
              (vswap! p conj value)     ; building the partition
              (if (= (count @p) n)
                (let [p* @p]
                  (vreset! p [])        ; clearing the partition storage
                  (reducing-function result p*)) ; adding the partition to the result
                result ; returning result as is, if the partition is not yet complete
                )))))
#'user/partition-all-transducer
user> (reduce ((partition-all-transducer 3) conj) [] (range 8))
[[0 1 2] [3 4 5]]

We can see that in the case of our implementation, only complete partitions were added to the result, yet there should be an additional incomplete partition, as shown by the direct partition-all call. This is because our transducer is missing a so-called completion step:

(defn partition-all-transducer [n]
  (fn [reducing-function]
    (let [p (volatile! [])]
      (fn
        ([result]                       ; completion arity
         (if (pos? (count @p))
           (reducing-function result @p)
           (reducing-function result)))
        ([result value]                 ; reduction arity
         (vswap! p conj value)
         (if (= (count @p) n)
           (let [p* @p]
             (vreset! p [])
             (reducing-function result p*))
           result))))))

Here I’ve added another arity, that must be called after the reduction process is complete. This arity checks if the array we’ve used to store the incomplete partition is not empty. If it’s not, it means that there are some leftovers, that we need to add to the result. So it calls reducing-function with this array, otherwise it will call reducing-function with result only, propagating completion step down the line. Invoking this arity after reduce completed, we can see that all partitions are present:

user> (let [f ((partition-all-transducer 3) conj)
            res (reduce f [] (range 8))]
        (f res)) ;; complete step
[[0 1 2] [3 4 5] [6 7]]

Our reduce example has become way too verbose, and there’s also a potential for error if our transducer leaked from this scope and someone else used it after it was completed. Notice that I’ve forgotten to clear the volatile p in the completion step, and if someone else calls this particular function again, these leftover elements will be added to the result again. (Try to achieve that.)

Therefore, Clojure abstracts the process of finalizing a transducer into a function, conventionally called transduce:

(defn transduce
  ([xform f coll]
   (transduce xform f (f) coll))
  ([xform f init coll]
   (let [f (xform f)
         ret (reduce f init coll)]
     (f ret))))

There’s one additional arity that can be added to our transducer implementation, which is used for initialization, done by calling the reducing function without arguments. This arity takes zero arguments, and it is optional, as not all reducing functions can come up with a meaningful initialization process, but it’s better to supply it than not. So a complete implementation of the map-transducer function is:

(defn map-transducer [f]
  (fn [reducing-function]
    (fn
      ([]                               ; init
       (reducing-function))
      ([result]                         ; complete
       (reducing-function result))
      ([result input]                   ; step
       (reducing-function result (f input)))
      ([result input & inputs]          ; step with multiple inputs
       (reducing-function result (apply f input inputs))))))
Code Snippet 2: A complete implementation of a mapping transducer

And that’s it! This is a complete implementation of a transducer. Clojure provides this as an additional arity of map where you only supply a function, without the collection, thus we don’t even need a separate function for this, it was merely for demonstration purposes.

If you look back at the Elixir example, you can see that when Stream implementations of map and filter are used, each function is applied in quick succession, opposed to Enum version, where map is applied first, and then the result is filter‘ed as a whole. With transducers, we just implemented, or with ones available to us in Clojure we can do exactly the same:

user> (->> (range 4)
           (map (fn [x] (println "map" x) (inc x)))
           (filter (fn [x] (println "filter" x) (odd? x)))
           first)
;; map 0
;; map 1
;; map 2
;; map 3
;; filter 1
;; filter 2
;; filter 3
;; filter 4
;; => 1
user> (->> (range 4)
           (transduce
            (comp (map (fn [x] (println "map" x) (inc x)))
                  (filter (fn [x] (println "filter" x) (odd? x))))
            conj)
           first)
;; map 0
;; filter 1
;; map 1
;; filter 2
;; map 2
;; filter 3
;; map 3
;; filter 4
;; => 1

Now, I should mention that it’s a bit stretched example. The lazy composition of sequences in the first snippet should behave similarly to the second because when we’re mapping a function in a lazy way we don’t consume the whole sequence, we only produce a new one, that knows how to construct itself based on the previous one. And if we then filter this mapped sequence, we again construct a new sequence, that knows how to build itself lazily. So, in an ideal world, the first example should also give us map,filter,map,filter kind of composition, but unfortunately, it is somewhat slow, so Clojure uses chunking, and lazy sequences can compute up to 32 elements upfront. This isn’t the case for Elixir example, Enum.map is eager (the Erlang module is also eager), and will produce fully realized sequences. Streams in Elixir are lazy, and perhaps their composition is what we would have for lazy sequences in Clojure if there was no chunking, but in our case transducers show the difference in composition a bit clearer.

Now let’s really understand how transducers work.

Understanding the inverse order in comp and how transducers are composed

The transduce call may have seemed a bit complicated because of comp, but here’s a nice trick I’ve found. Simply remember, that the order of transducers in comp is exactly the same as the order of calls in the ->> macro. However, it may seem counter-intuitive, because usually functions in comp are applied in the reverse order, e.g.:

((comp a  b  c  d) x)
                                        ; expressions are aligned for clarity
      (a (b (c (d  x))))

In other words, even though functions are provided in order a, b, c, d, the call order will be d, c, b, a. And in the case of transducers the composition works the same way, it’s just we have one extra step after we’ve passed the reducing function.

As I’ve demonstrated in this example, the composition basically is engineered in such a way that it kinda inverses its order twice. The first inversion happens after we compose functions, and the second inversion happens when we pass the reducing function. Let’s use the substitution model, sometimes referred to as normal order evaluation to see why this happens.

Substitution here basically means that before we do any reduction, we expand all forms until they only contain primitives. This is done by substituting names with expressions they refer to. For example, given these three functions:

(defn add [a b] (+ a b))
(defn square [x] (* x x))
(defn sum-squares [a b] (+ (square a) (square b)))

We can walk through the (sum-squares (add 1 1) (add 1 2)) expression, and see how it will be evaluated in normal order, and how it differs from applicative order:

Normal orderApplicative order
(sum-squares (add 1 1) (add 1 2))(sum-squares (add 1 1) (add 1 2))
(sum-squares (+ 1 1) (+ 1 2))(sum-squares (+ 1 1) (+ 1 2))
(+ (square (+ 1 1)) (square (+ 1 2)))(sum-squares 2 3)
(+ (* (+ 1 1) (+ 1 1)) (* (+ 1 2) (+ 1 2)))(+ (square 2) (square 3))
(+ (* 2 2) (* 3 3))(+ (* 2 2) (* 3 3))
(+ 4 9)(+ 4 9)
1313

Notice, that in normal order (+ 1 1) and (+ 1 2) are executed twice, because all substituting happens before any reduction. In applicative order, reduction happens before substituting, so every expression is computed only once. Applicative order is more in line with real evaluation rules, but it’s harder to see what’s happening when we compose things, so I’ll use the normal order to show how transducers are composed.

With this in mind, let’s try to walk through the following expression:

((comp (map-transducer inc)
       (filter-transducer odd?))
 conj)

First, we need to substitute map-transducer and filter-transducer with their (simplified) implementations:

((comp ((fn [f]
          (fn [rf]
            (fn [result value]
              (rf result (f value))))) inc)
       ((fn [f]
          (fn [rf]
            (fn [result value]
              (if (f value)
                (rf result value)
                result)))) odd?))
 conj)

Next, let’s substitute f with inc and odd? and get rid of function calls, substituting them with anonymous functions that accept rf:

((comp (fn [rf]
         (fn [result value]
           (rf result (inc value))))
       (fn [rf]
         (fn [result value]
           (if (odd? value)
             (rf result value)
             result))))
 conj)

Now, knowing that composition of functions f and g is (fn [x] (f (g x))) we can substitute comp with this expression, and then substitute f and g in this expression with our functions from the previous step:

((fn [x]
   ((fn [rf]
      (fn [result value]
        (rf result (inc value))))
    ((fn [rf]
       (fn [result value]
         (if (odd? value)
           (rf result value)
           result)))
     x)))
 conj)

Note that according to composition rules, odd? should be executed first, and inc would follow it, but we’re not done composing yet. Let’s substitute x for conj and remove the function call:

((fn [rf]
   (fn [result value]
     (rf result (inc value))))
 ((fn [rf]
    (fn [result value]
      (if (odd? value)
        (rf result value)
        result)))
  conj))

We can now substitute rf for conj in the innermost function call, and remove the innermost function that accepts rf:

((fn [rf]
   (fn [result value]
     (rf result (inc value))))
 (fn [result value]
   (if (odd? value)
     (conj result value)
     result)))

Finally, we can substitute outer rf with the entire inner function body, and remove another call:

(fn [result value]
  ((fn [result value]
     (if (odd? value)
       (conj result value)
       result))
   result (inc value)))

As the last step, we’re substituting the inner function’s value to (inc value) and result to result to eventually get:

(fn [result value]
  (if (odd? (inc value))
    (conj result (inc value))
    result))

The final function that will be executed by reduce, which is just an ordinary two-argument function! And as you can see inc happens before odd?.

So yes, the order in comp may seem inverse, as inc and odd appear in a logical order, but thanks to how the whole composition process evolves, this logical order can be preserved in the resulting function. I hope that with this substitution model you can now understand the whole composition process of transducers, which is not a trivial process by any means. But I’m actually amazed by how this simple idea achieves such a complete abstraction that can be used in all kinds of transformation contexts.

Speaking of transduceable contexts, let’s implement one!

Implementing sequence transduceable context

Now we’re ready to implement sequence in Clojure, without direct Java interop. First, let’s remember what reduce does:

  • Accepts a function, initial value, and a collection;
  • Gets an element of a collection and passes the initial value and the element to the reducing function;
  • The reducing function returns the current result;
  • The result is then passed to the reducing function alongside the next element from the collection;
  • Once the collection is exhausted, the result is returned.

Thus, reduce can be written as an ordinary loop. However, sequence is lazy, therefore we can’t just loop through the collection, but thankfully, lazy sequences can be recursive. And, there’s another problem, we must append each element to the sequence we’re producing, but we also need to check if we’ve actually finished, or if we need to skip the element because it is filtered out. And we need to call the completion step somewhere.

Though, if you think about it, we don’t need for our transducer to actually append elements to a collection, it can merely do the transformation, and since we know what kind of collection we’re building, we can build it later. With this approach, we can check the result of a transducer on each step and act accordingly.

First, let’s finalize the transducer with a reducing function:

(defn sequence [xform coll]
  (let [f (xform (completing #(cons %2 %1)))]
    ;; lazy loop?
    ))

It may seem that we’re using cons here because we’re building a sequence, but it’s not. We could actually use anything, like conj or even a function that returns something that we can later distinguish from other values. In cons call, we have to reverse arguments though, because cons adds the second argument to a list, provided as a first argument. And completing simply adds a completion step that just returns the value it’s been given, in other words, we could write it as (fn ([a] a) ([a b] (cons b a))).

Now let’s figure out how to loop through the collection. We can start with an ordinary loop and then convert it to recursion, adding laziness as the last step.

(defn sequence [xform coll]
  (let [f (xform (completing #(cons %2 %1)))
        res (loop [s (seq coll)
                   res ()]
              (if s
                (let [x (f nil (first s))]
                  (if (seq? x)
                    (recur (next s) (concat res x))
                    (recur (next s) res)))
                res))]
    res))

Let’s try it:

user> (sequence (map inc) [1 2 3])
(2 3 4)
user> (sequence (partition-all 2) [1 2 3 4 5])
([1 2] [3 4])
user> (clojure.core/sequence (partition-all 2) [1 2 3 4 5])
([1 2] [3 4] [5])

Seems to work, but we’re missing the completion step (hence no incomplete partition in the result), so let’s add it:

user> (defn sequence [xform coll]
        (let [f (xform (completing #(cons %2 %1)))
              res (loop [s (seq coll)
                         res ()]
                    (if s
                      (let [x (f nil (first s))]
                        (if (seq? x)
                          (recur (next s) (concat res x))
                          (recur (next s) res)))
                      (f res)))]        ; complete
          res))
user> (sequence (partition-all 2) [1 2 3 4 5])
([5] [1 2] [3 4])

Oops, remember, that we’re using cons and we can’t really call f with res as a completion step, because our reducing function doesn’t know how to build the whole collection, only how to transform a single element. Instead, we have to call it with nil as before, and concat it with the result:

user> (defn sequence [xform coll]
        (let [f (xform (completing #(cons %2 %1)))
              res (loop [s (seq coll)
                         res ()]
                    (if s
                      (let [x (f nil (first s))]
                        (if (seq? x)
                          (recur (next s) (concat res x))
                          (recur (next s) res)))
                      (concat res (f nil))))] ; proper completion
          res))
user> (sequence (partition-all 2) [1 2 3 4 5])
([1 2] [3 4] [5])

Now let’s make it recursive, by replacing loop with an anonymous function, and recur with actual recursion:

user> (defn sequence [xform coll]
        (let [f (xform (completing #(cons %2 %1)))
              step (fn step [coll res]
                     (if-some [s (seq coll)]
                       (let [x (f nil (first s))]
                         (if (seq? x)
                           (step (next s) (concat res x))
                           (step (next s) res)))
                       (concat res (f nil))))]
          (step coll ())))
#'user/sequence
user> (sequence (partition-all 2) [1 2 3 4 5])
([1 2] [3 4] [5])
user> (dorun (sequence (map inc) (range 100000)))
Execution error (StackOverflowError) at user/sequence$step (REPL:25).
null

It still seems to work, but overflows with enough elements. Luckily, we can use lazy-seq to eliminate this problem, and actually make our implementation lazy:

user> (defn sequence [xform coll]
        (let [f (xform (completing #(cons %2 %1)))
              step (fn step [coll res]
                     (if-some [s (seq coll)]
                       (let [x (f nil (first s))]
                         (if (seq? x)
                           (lazy-seq (step (rest s) (concat res x)))
                           (lazy-seq (step (rest s) res))))
                       (concat res (f nil))))]
          (step coll ())))
user> (dorun (sequence (map inc) (range 100000)))
Execution error (StackOverflowError) at user/sequence$step (REPL:1).
null

And it still throws the StackOverflowError. Why?

Well, because we’re not really lazy yet. Instead of passing the result to the next iteration of step, as we did in loop we should use the result of step and concatenate with it. Let’s reorganize our function:

user> (defn sequence [xform coll]
        (let [f (xform (completing #(cons %2 %1)))]
          ((fn step [coll]
             (if-some [s (seq coll)]
               (let [res (f nil (first s))]
                 (if (seq? res)
                   (concat res (lazy-seq (step (rest s))))
                   (step (rest s))))
               (f nil)))
           coll)))
user> (dorun (sequence (map inc) (range 100000)))
nil

Now it doesn’t overflow. However, it is not yet ready to be used, because reduce, as you may know, can be terminated with reduced, and some transducers, like take leverage that to terminate the process. So we need to check for reduced? in our implementation. Not only that, but we have to call the completion step on the value, returned by dereferencing the reduced object, otherwise it will not be added to the resulting sequence properly.:

user> (defn sequence [xform coll]
        (let [f (xform (completing #(cons %2 %1)))]
          ((fn step [coll]
             (if-some [s (seq coll)]
               (let [res (f nil (first s))]
                 (cond (reduced? res) (f (deref res)) ; checking for early termination
                       (seq? res) (concat res (lazy-seq (step (rest s))))
                       :else (step (rest s))))
               (f nil)))
           coll)))
#'user/sequence
user> (sequence (comp (partition-all 2) (take 5)) (range))
([0 1] [2 3] [4 5] [6 7] [8 9])
user> (sequence (comp (take 5) (partition-all 2)) (range))
([0 1] [2 3] [4])

And we’re done! Well, almost, sequence should coerce the result to an empty sequence, and our current version will return nil if the transducer never returned anything. It’s easy enough to fix:

(defn sequence [xform coll]
  (let [f (xform (completing #(cons %2 %1)))]
    (or ((fn step [coll]
           (if-some [s (seq coll)]
             (let [res (f nil (first s))]
               (cond (reduced? res) (f (deref res))
                     (seq? res) (concat res (lazy-seq (step (rest s))))
                     :else (step (rest s))))
             (f nil)))
         coll)
        ())))
Code Snippet 3: Final version of our sequence transducer.

We can see that it is lazy, by using side-effecting transducers, like ones with println in those:

user> (sequence
       (comp (map (fn [x] (println "map" x) (inc x)))
             (filter (fn [x] (println "filter" x) (odd? x)))
             (take 3))
       (range))
map 0
filter 1
(1map 1
filter 2
map 2
filter 3
 3map 3
filter 4
map 4
filter 5
 5)
Code Snippet 4: Notice that the output is all messed up, because the sequence started printing before it was realized, and side effects appeared during the pretty printing process, which itself is lazy.

And as you can see, our sequence behaves in the same way as the clojure.core/sequence, regarding laziness:

user> (do (sequence
           (comp (map (fn [x] (println "map" x) (inc x)))
                 (filter (fn [x] (println "filter" x) (odd? x))))
           (range)) ;; note, infinite range
          nil)
map 0
filter 1
nil
user> (do (clojure.core/sequence
           (comp (map (fn [x] (println "map" x) (inc x)))
                 (filter (fn [x] (println "filter" x) (odd? x))))
           (range))
          nil)
map 0
filter 1
nil

I think now sequence is completed, though I’ll need to test it very extensively in the future. I’ve already tested it a lot when I was porting it to Fennel, and I think it should work correctly, looking at the code at least I don’t see anything that could go wrong.

What have I learned

Implementing transduceable context, in this case, the sequence function, was a nice puzzle. I’m sure it’s not as efficient, as the clojure.core version, mainly due to the use of concat but I wasn’t able to come up with a better way of building the result which can be done lazily.

And after actually implementing sequence, porting it, and a lot of transducers to Fennel, I’ve finally figured out how they actually work, and I hope that now you understand it too! This is why I love Clojure - the developers are putting a lot of thought into such features, and not taking shortcuts, like reimplementing all functions for data sources that can’t be transformed into sequences. For me, it’s a really practical language, with a lot of tools that can enhance the programming experience, and make it fun.

Of course, if you have any questions, feel free to email me, and I’ll try to answer them, and maybe update the post for future readers. This topic is a bit complicated, so I hope it was not a boring post. Thanks for reading!


  1. I actually like this approach to learning. Usually, I’m not reading how a thing is implemented, and instead trying to figure out everything myself. This unfortunately doesn’t produce the greatest results, as a lot of stuff I’m trying to learn this way has a lot of research put into it, and I basically try to reimplement a thing only based on assumptions and observations. Nevertheless, I’m satisfied with the process, and in the end, if I got something working, I feel happy, and even more so, when I’ve got some concepts exactly right. I’m not suggesting that this is a superior way to learn, but it is at least very enjoyable for me personally. ↩︎