;; seq version
(->> (range 10)
(map inc)
(filter even?)
(reduce +))
;=> 30
;; reducer version
(require '[clojure.core.reducers :as r])
(->> (range 10)
(r/map inc)
(r/filter even?)
(r/reduce +))
;=> 30
Know Your Reducers
In one episode of Man vs. Wild Will Ferrell, Hollywood actor and comedian, accompanies Bear Grylls, professional survivalist, deep inside the Arctic circle. The very first thing Grylls makes Ferrell do is rappel from a helicopter hovering 150 feet above the ground. As Ferrell makes his descent, he has only one word for the camera: "Mommy!"
That’s about how I felt when I was first tried to learn about reducers: plunked down far from home in a completely foreign and uncomfortable landscape. Let’s avoid creating that feeling in you by starting with something familiar: seq functions.
Reducers vs. Seq Functions
The reducers library (in the clojure.core.reducers
namespace) has
alternative implementations of map
, filter
, and other seq
functions. These alternative functions are called reducers, and you
can apply almost everything you know about seq functions to
reducers. Just like seq functions, the purpose of reducers is to
transform collections. For example, if you wanted to increment every
number in a vector, filter out the even ones, and sum the numbers, the
seq and reducer versions look virtually identical. Try this in a REPL:
In both examples, (range 10)
returns a seq of the numbers 0
through 9. In the first example, map
is a seq function. In the
second, r/map
is a reducer. In both examples, map
has the same
purpose: transforming an input collection to an output by applying a
function to each element of the input. Reducers are just another means
of transforming collections, and for the most part they’re used just
like the seq functions you already know and love.
So why bother with them? Reducers are designed to perform better (often dramatically better) than seq functions under some circumstances by performing eager computation, eliminating intermediate collections and allowing parallelism.
In fact, as far as I can tell, parallelization is the main reason Rich Hickey, Lambda Whisperer added reducers to the language. The goal of executing collection transformations in parallel completely determined how reducers are designed, and even explains why we use the term reducers: reduce operations can be trivially parallellized (you’ll learn how by the end), and every collection transformation can be expressed in terms of reduce (there’s a big juicy math paper on that).
Thus, the reducers library is built around using reduce to transform collections, and it’s designed this way to make parallelization possible. Both eager computation and the elimination of intermediate collections are secondary - but useful - consequences of this design. (By the way: "elimination of intermediate collections" is a freaking mouthful, so I’m going to start using EIC).
Let’s look at these three strategies more closely. After that, I’ll share rules for how to ensure that your code makes use of each of them, without going into much detail about why they work as they do. Last, I’ll also give some examples of reducers at work, showing how their performance compares to alternatives.
The Strategies
Understanding the reduce library’s three performance strategies - eager computation, EIC, and parallelism - will give you the rationale behind reducers, and that will help you understand their implementation later on. Plus, these ideas are pretty groovy in their own right. I know you’re eager to learn about eager computation, so let’s go!
Eager Computation
Eager computation stands in contrast to Clojure’s core sequence functions, which produce and consume lazy sequences. You may remember that lazy seqs don’t compute their members until you try to access them. (For a refresher on lazy sequences, check out the lazy seqs section in CFTBAT.) This is often a good performance strategy because it saves your program from doing unnecessary computations, but in cases where you know you have to realize (compute the elements of) the entire seq, laziness actually introduces some unnecessary overhead. Reducers perform eager computation: they always compute every member of a collection, and that can improve performance slightly.
Because reducers are eager, you shouldn’t use them with infinite sequences unless you want something useless to happen. The reducer would try to realize the entire sequence, which isn’t possible, and consume all available memory in the process.
This isn’t to say that reducers can’t work on lazy seqs. They totally can! They just fail if the lazy seq is infinite.
Eliminating Intermediate Collections
Reducers don’t produce intermediate collections. An intermediate collection is a collection produced by one sequence transforming function and passed as an argument to another sequence transforming function. Let’s look at a kind of stupid example:
(->> (list "Shake" "Bake") ; ➊
(map #(str % " it off")) ; ➋
(map clojure.string/lower-case) ; ➌
(into [])) ; ➍
This code just takes a list of words ➊ and performs two map
operations. First, it appends " it off"
to each string ➋, then it
lower cases each ➌. Finally, it puts the result in a vector ➍.
Silly code that would never really exist. What we care about here is
that the map
at ➋ constructs a new collection (a lazy seq, as it
happens), and so does the map
at ➌. Each of these intermediate
collections takes resources to construct and traverse. How
inefficient! Haters are going to hate on this inefficiency, but
luckily you can shake it off with reducers. (Note to self: more Taylor
Swift references!)
Looking at this code you can deduce that it would be more efficient to just combine the two string functions into one:
(->> (list "Shake" "Bake")
(map (comp clojure.string/lower-case #(str % " it off")))
(into []))
This eliminates one of your intermediate collections, and makes your
code faster. In parallel programming terminology, you’d say that you
fused two elemental functions into one. An elemental function is
just a normal function, like lower-case
, but we give it the
qualifier elemental to communicate that we’re talking about it in
the context of collection transformation; it’s applied to each element
of the collection. Fusion is the composition of elemental functions
to avoid producing intermediate collections.
It would be cumbersome to have to rewrite your code to achieve this kind of fusion, though. Ideally, you want your collection transformations to compose into one fused function without any manual intervention on your part. You want to be able to write idiomatic Clojure code (like the following) and have it "just work":
(->> (range 1000)
(r/filter even?)
(r/map inc)
(into []))
And hey, guess what! Reducers just work! They’re actually designed to compose into a single fused elemental function when you chain them like in the example above, without any manual labor on your part. In the above example, it’s as if you’re telling Clojure, "Take a seq of the numbers 0 through 999. For every element of that function, apply a function that does three things: filters the element if it’s even, then increments the result (assuming the element wasn’t filtered), and then places the result in a vector."
Note
|
Transducer Performance
Transducers also eliminate intermediate collections. In order to use them, you have to write code that looks slightly different from bread-and-butter seq functions or reducers, like this:
|
In a few minutes, you’ll see some numbers that show just how much EIC can improve performance. But first, let’s look at the final performance strategy that reducers enable: parallelism.
Parallelism
Parallelism is the simultaneous execution of tasks on two or more processors.
Task is a nebulous word, so let’s define it. I use task to refer to function calls, but from the perspective of the processor instead of the programmer; execution instead of semantics. That’s still nebulous, so let’s look at a concrete example.
We might as well start working on iFacePalm. After all, that’s what I’m paying you for, right? Here’s some Clojure code we could write that would get us started:
(defn lifeline-and-years-lived
"Given a human subject, return vector of lifeline ratio and years
person lived"
[{:keys [palm-stats life-history] :as subject}]
[(:lifeline-ratio palm-stats) (:years-lived life-history)])
This function extracts a person’s lifeline ratio (length of lifeline compared to hand size) and age at death, and putting the two in a vector. Here’s how you’d use it:
(lifeline-and-years-lived {:palm-stats {:lifeline-ratio 0.5}
:life-history {:years-lived 75}})
From the programmer’s perspective, we’d call this a function call. We care about the return value and how to relate that to other functions according to business needs. We care about the meaning of the function call.
From the execution perspective, we’d call it a task. It’s just some work that needs to get done, and we don’t care about its meaning or how it relates to business needs.
So, if I said to you, "Let’s speed this program up by putting this task on another thread," you’d know that I’m not talking about changing the meaning of the program. I’m just talking about some work that needs to get done, and how to shuffle the work around for better performance.
Now let’s say we wrote something like this:
(map lifeline-and-years-lived subjects)
In the same way that this is one function call, map
, which results
in many applications of lifeline-and-years-lived
, it’s one task that
results in the execution of many sub-tasks. You can treat the larger
task as single, black box task, just as you can treat the function as
a black box.
So, parallelism is all about executing tasks simultaneously on multiple processors. Given the right conditions (explained in the next section), reducers will transparently subdivide their transform a collection task into subtasks of transform this partitioned portion of a collection. These subtasks will execute in parallel, and the results are recombined, also transparently. So, for example, if you have this code:
(r/fold + (vec (range 10000000)))
Then Clojure will divide your vector of ten million numbers into sub
vectors of 512 elements, run (reduce + subvector)
on those
subvectors in parallel, and combine the results.
This divide/parallelize/re-combine process relies on the fork/join framework. It’s too soon to cover that, but do not be sad, gentle reader, for it is covered in Part Two.
Now that you know about the performance strategies employed by reducers, let’s look at how to actually use them.
How to Use Reducers
You’ve seen how, for the most part, you can use reducers just like seq functions. For example, this produces the same result as the seq counterpart:
(require '[clojure.core.reducers :as r])
(->> [1 2 3 4]
(r/filter even?)
(r/map inc)
(r/reduce +)
; => 8
There are a couple rules to keep in mind, though. First, if you want
to make use of parallelism, you have to use the r/fold
function with a
foldable collection. r/fold
is a parrallel implementation of reduce
.
I need to explain parallelism a bit more before explaining what foldable means, but for now the relevant fact is that vectors and maps are the only foldable collections. So if you wrote code like this, you might expect it to run in parallel, but it wouldn’t:
(require '[clojure.core.reducers :as r])
(->> '(1 2 3 4)
(r/filter even?)
(r/map inc)
(r/fold +)
; => [3 5]
Lists aren’t foldable, so reducers can’t operate on them in
parallel. Instead, the code falls back to serial reduce
.
The second rule is that reducers don’t produce collections. It’s
awesome that r/map
and r/filter
compose without creating
intermediate collections, but the catch is that they behave a little
differently from seq functions. Here’s one way you could get tripped
up:
(first (r/map inc [1 2 3]))
; => java.lang.IllegalArgumentException: Don't know how to create ISeq from: clojure.core.reducers$folder$reify__17186
This happens because r/map
and the other reducers don’t actually
return a new collection. Instead, they return a reducible.
A reducible is like a recipe for how to produce a new collection,
along with a reference to a source collection. In the above example,
the recipe is "produce a new collection by incrementing each element
each element of the source collection", and the source collection is
the vector [1 2 3]
.
Another way to put it: a reducible is an elemental function, along with the collection whose elements you want to apply the function to.
And yet another way to put it: It’s as if you were one of Santa’s worker elves (or some other mythical creature of labor), and the foreman handed you a piece of paper with your instructions for the day: "Go to the plastic shed out back, get all the plastic lumps and turn them into toy whatevers to feed to the insatiable, gaping maw of consumerism."
If I then walked up to you and said, "give me the first of that piece
of paper", you would look at me in confusion, because the request
wouldn’t make any sense. The piece of paper is not a collection that
you can take the first of. Similarly, when you tell Clojure (first
(r/map inc [1 2 3]))
it expresses its confusion with an exception
because the request doesn’t make sense.
You might initially think you’re telling Clojure "give me the first element of the collection returned by `r/map`", but you’re actually saying "give me the first element of the reducible returned by `r/map`". A reducible is a recipe for how to transform a given collection, but it’s not a collection itself, so Clojure can’t fulfill your request.
This design decision to have functions like r/map
return a reducible
instead of a collection is what allows reducers to seamlessly compose,
building up a single fused elemental function. I’ll explain exactly
how this happens later in the book, and for now rely on the power
of metaphor so that you’ll have a strong intuitive understanding of
how it works.
But you need to get collections somehow. If you want to use reducer
functions to produce another collection, you have to explicitly
realize the result collection by calling r/foldcat
or
clojure.core/into
. r/foldcat
calls r/fold
internally, and into
calls reduce
. Here are two examples:
(->> [1 2 3 4]
(r/map inc)
(into [])
(first))
; => 2
(->> [1 2 3 4]
(r/map inc)
(r/foldcat)
(first))
; => 2
You should use into
when you want to explicitly specify the
collection type. In the example above, you’re realizing the collection
as a vector. into
is serial.
r/foldcat
executes parallelly (is that a word?) and returns a
collection that acts pretty much like a vector. The collection is
foldable, so you can use it with reducers in further parallel
computations. It’s also seqable, like a vector, so you can call
functions like first
on it.
To summarize the rules:
-
If you want to parallelize your reduction, use
r/fold
and a foldable collection (a vector or map). -
If you want your transformations to return a collection, use
into
orr/foldcat
.into
executes serially and lets you specify the collection type, whiler/foldcat
is parallel and returns a vector-like collection.
Soooo now that you’ve spent all this time learning about reducers' performance strategies and how to use them, let’s wrap everything together by comparing reducer performance to seq functions. If I’ve done my job right, you won’t be surprised by the differences. If I haven’t done my job right, then, well… sorry?
A Peek at Performance
Reducer performance depends on two variables:
-
Whether you’re producing intermediate collections
-
Whether you’re using
r/fold
with foldable collections
To see the performance impact of these variables, we’re going to look
at similar computations performed using r/fold
(for parallelism and
EIC), r/reduce
(for EIC alone), and clojure.core/reduce
as a
baseline. We’ll also perform the computations on foldable
collections (vectors) and on unfoldable collections (lazy seqs).
You can find the code for this section in the code folder of your
download under src/ifacepalm/using_reducers.clj (paid version
only). After doing the performance measurements for awhile, I wanted
to make the process a bit less tedious, so I wrote the macro times
and its helper functions pretty-time
and runs-and-pauses
to
abstract the process of running a snippet multiple times, timing each
run, averaging those time values, and usefully printing the
result. This lets you do something like this:
(let [x (vec (doall (range 1000000)))]
(times (r/fold + x)
(r/reduce + x)
(reduce + x)))
" 6.62ms (r/fold + x)"
" 20.94ms (r/reduce + x)"
" 16.05ms (reduce + x)"
In this example, you’re performing the same computation using three different methods, and seeing how long each method takes.
The code for times
is in the ifacepalm.time-helpers
namespace, but
I’m not going to go over it here for fear that you would (justifiably)
shake your fist at me from behind your monitor for veering off topic.
Here are our first comparisons:
;;------- Simple operations
(def snums (range 10000000)) ;
(def snumsv (vec snums))
(defn t1 [x]
(times (r/fold + x)
(r/reduce + x)
(reduce + x)))
(defn t2 [x]
(times (->> x (r/map inc) (r/fold +))
(->> x (r/map inc) (r/reduce +))
(->> x (map inc) (reduce +))))
(defn t3 [x]
(times (->> x (r/filter even?) (r/map inc) (r/fold +))
(->> x (r/filter even?) (r/map inc) (r/reduce +))
(->> x (filter even?) (map inc) (reduce +))))
First, we define two collections of numbers from 0 to
99,999,999. snums
is an unfoldable seq and snumsv
is a foldable
vector.
Next, we define three functions, t1
, t2
, and t3
. Each function
takes a collection (we’ll use snums
and snumsv
), transforms it
using r/fold
, r/reduce
, and reduce
, and prints out how much time
each transformation takes. Enough with the yik yak, let’s actually use
them:
(t1 snums)
; => " 127.28ms (r/fold + x)"
; => " 131.67ms (r/reduce + x)"
; => " 119.67ms (reduce + x)"
(t1 snumsv)
; => " 59.15ms (r/fold + x)"
; => " 164.32ms (r/reduce + x)"
; => " 148.15ms (reduce + x)"
When we call t1
with snums
, the three reduction functions have
roughly the same performance. Put another way: we don’t get any
performance benefits from EIC or parallelism. This makes sense because
we’re not doing a map, filter, or some other function that
semantically returns a new collection. We’re also using the unfoldable
snums
, and in that case r/fold
quietly degrades to r/reduce
.
When we call t2
with snumsv
, though, we see a significant speedup
from r/fold
. My computer has four cores, so ideally r/fold
would
take 25% as much time as the serial versions, but it’s rare to meet
that ideal because parallelization has overhead costs.
Let’s see what happens when we have an intermediate transformation (map):
(t2 snums)
; => " 166.43ms (->> x (r/map inc) (r/fold +))"
; => " 169.32ms (->> x (r/map inc) (r/reduce +))"
; => " 322.66ms (->> x (map inc) (reduce +))"
Dayumn! The reducer versions take nearly fifty percent less time!
There’s no difference between r/fold
and r/reduce
, though, becuase
we’re not using a foldable collection. Here’s what happens when you
change that:
(t2 snumsv)
" 74.45ms (->> x (r/map inc) (r/fold +))"
" 214.04ms (->> x (r/map inc) (r/reduce +))"
" 374.91ms (->> x (map inc) (reduce +))"
Here, r/fold
is five times faster than reduce
, thanks to EIC and
parallelism. (Interestingly, it seems serial reduce takes a bit longer
with vectors.)
What happens if you have two intermediate collections? This happens:
(t3 snums)
" 184.07ms (->> x (r/map inc) (r/filter even?) (r/fold +))"
" 181.03ms (->> x (r/map inc) (r/filter even?) (r/reduce +))"
" 431.53ms (->> x (map inc) (filter even?) (reduce +))"
(t3 snumsv)
" 71.96ms (->> x (r/map inc) (r/filter even?) (r/fold +))"
" 207.47ms (->> x (r/map inc) (r/filter even?) (r/reduce +))"
" 478.25ms (->> x (map inc) (filter even?) (reduce +))"
When you call r/fold
with a vector, it’s almost seven times
faster. Not only that, but look at the times for the r/reduce
calls
in (t2 snumsv)
and (t3 snumsv)
: 214ms and 207ms
respectively. Adding the intermediate transformation had virtually no
impact on performance. (It actually took less time in this case, but I
chalk that up to random noise.) Compare that to the reduce
calls:
374ms and 478ms. Huge performance impact!
This should give you a good idea of the kinds of performance gains that reducers can give you. And the best part is that your code will perform even better if you run it on a machine with more cores, without any extra work. Neat!
And thus concludes the introductory tutorial on reducers. Hopefully I’ve given you everything you need to start using them effectively. Next comes the really fun part: learning about parallelism concepts so that you’ll have a complete understanding of what the reducers library is doing.