The "Potter" Coding Dojo (in Clojure)

At the end of May our remote-first company met up for one of our thrice-yearly week-long meetups. One of the objectives is to socialise and do group activities that build psychological safety so that we feel safe to take risks and feel vulnerable in front of each other. This means we can communicate more confidently & efficiently when we are back in our remote offices.

This time we met up in Majorca¹ and sessions included ice-breakers (people-bingo; building a tower with marshmallow, tape and spaghetti; etc), building a Technology Radar, Coding Dojo, How To Master Git, Biomechanics For Desk People, and lightning talks—to mention a few. (We also celebrated a product launch and—yes—managed to find time for some bathing, paddling—and even a couple morning runs.)

It is the Coding Dojo I want to talk about in this post. Or; not actually the session itself, but my efforts to solve one of the problems featured in it. It was briefly stated but surprisingly deep and I have found myself thinking about it a lot since.

A Warning
The Problem
Towards a (simplistic) solution
Finding a solution at all for large purchases
Actually passing tests for edge cases
Towards a moderately fast, correct & robust solution
Epilogue

A Warning

If you are considering doing this Coding Dojo yourself, you may want to postpone reading this until you do because it provides not one but two different solutions that passes all the suggested test cases.

The Problem

(Paraphrased by yours truly; the original is over at codingdojo.)

There is a bookshop that has only five distinct books; Let's call them 1–5
The shop has unlimited quantities of each book
Each book costs £8
You can get a discount for buying several distinct books as a unit
There is no discount for buying two (or more) copies of the same book as a unit
A purchase can contain multiple units, thus you can get multiple discounts for a single purchase

These are the discounts:

distinct books	discount %
2 distinct books	5
3 distinct books	10
4 distinct books	20
5 distinct books	25

Your mission: write a function to calculate the price of a purchase, such that the customer always pays the minimum available price.

Example: Buying the books 1, 2 & 3 can be put through the till in several different ways:

#1	Individually (no discounts)	$£ 8 \times 3 = £ 24$
#2	Discount for two of the books¹	$£ 8 + 2 \times £ 8 \times (1 - 0.05) = £ 23.2$
#3	Discount for three books	$3 \times £ 8 \times (1 - 0.1) = £ 21.6$

In this case, your function should pick option #3 as it is the cheapest.

Towards a (simplistic) solution

I immediately thought that this was a combinatorial problem and scoured Clojure's collections library for suitable combinatorial operators. I didn't find what I wanted in the standard library, but clojure.math.combinatorics appeared to have what I needed.

clojure.math.combinatorics/partitions return all the different ways to partition the input into subsets, which sounds like it might work. I felt I was ready to start coding.

Let's start easy, with an import statement for clojure.math.combinatorics, a def for our book price, and a map of available discounts:

(ns potter.core
  (:require [clojure.math.combinatorics :as combo]))

(def book-price 8)

(def discount {2 0.95
               3 0.9
               4 0.8
               5 0.75})

Then let's try to price a single partition. Because combo/partitions returns all possible partitions, some have the same book twice in it. This makes no sense with our discount structure. However, we can just price superfluous books as if they were bought on their own, and the price attempts should be discarded by more optimal sets of partitions.

(defn price-partition
  [books]
  (let [unique (count (distinct books))
        mult (or (discount unique) 1)]
    (* book-price
       (+ (* unique mult)
          (* (- (count books) unique))))))

We're ready to price a sequence of partitions, each with their own discount applied, to make up a whole purchase.

(defn price-partition-seq
  [partitions]
  (->> partitions
       (map price-partition)
       (reduce +)))

Now, all we need to do is make our top-level price function that takes our input and returns the best price we can give. It is a brute force solution that takes the input and first calculates all possible ways to partition that into subsets, then calculates the price for each of those subsets, sorts the list of prices and grabs the lowest one. Simples!

(defn price
  [books]
  (->> books
       combo/partitions
       (map price-partition-seq)
       sort
       first))

Looks great! Let's try it out against some of the provided test cases²:

(testing "no discounts"
  (are [total books]
      (= total (price books))
    0 []
    8 [1]
    16 [2 2]
    24 [3 3 3]
    32 [4 4 4 4]
    40 [5 5 5 5 5]))

Alright!

(testing "Simple discounts"
  (are [total books]
      (= total (price books))
    (* 16 95/100) [1 2]
    (* 16 95/100) [1 3]
    (* 16 95/100) [1 4]
    (* 16 95/100) [1 5]
    (* 8 3 9/10) [1 3 5]
    (* 8 4 8/10) [1 2 3 5]
    (* 8 5 75/100) [1 2 3 4 5]))

Winning!

(testing "Multiple discounts"
  (are [total books]
      (= total (price books))
    (+ 8 (* 2 8 95/100)) [1 1 2]
    (* 2 (* 2 8 95/100)) [1 1 2 2]
    (+ (* 8 4 8/10) (* 8 2 95/100)) [1 1 2 3 3 4]
    (+ 8 (* 5 8 75/100)) [1 2 2 3 4 5]))

Still got it!

(testing "Edge cases"
  (are [total books]
      (= total (price books))
    (* 2 (* 8 4 8/10)) [1 1 2 2 3 3 4 5]
    (+ (* 3 (* 8 5 75/100)) (* 2 8 4 8/10)) [1 1 1 1 1
                                             2 2 2 2 2
                                             3 3 3 3
                                             4 4 4 4 4
                                             5 5 5 5]))

BOOOOOO! Failed!

That last edge case has 23 books in it, and finding all the possible partitions is a function whose time consumption grows rapidly with the number of books. For 13 to 15 books the time taken for this function roughly triples for each book added, so 23 books will take rather a long time. Witness:

potter.core> (time
              (count
               (combo/partitions
                [1 1 1
                 2 2 2
                 3 3 3
                 4 4
                 5 5])))
"Elapsed time: 1487.052629 msecs"
;; => 200549

potter.core> (time
              (count
               (combo/partitions
                [1 1 1
                 2 2 2
                 3 3 3
                 4 4 4
                 5 5])))
"Elapsed time: 4089.57496 msecs"
;; => 573003

potter.core> (time
              (count
               (combo/partitions
                [1 1 1
                 2 2 2
                 3 3 3
                 4 4 4
                 5 5 5])))
"Elapsed time: 12744.500346 msecs"
;; => 1688360

It's time to try a different approach.

Finding a solution at all for large purchases

Although the simple solution works for small numbers of books, it is impractical for larger stacks of books. I felt that it might be possible to break the problem down somewhat, or at least approximate the solution for larger problems. I renamed my price function to best-price and added a new fast-price function³:

(defn fast-price
  [books]
  (loop [counts (->> books frequencies vals sort)
         total 0]
    (if (seq counts)
      (let [c (first counts)
            n (count counts)
            p (* c (price-partition (take n (iterate inc 1))))]
        (recur (->> counts
                    (map #(- % c))
                    (remove zero?))
               (+ total p)))
      total)))

Then it was a case of finding a suitable threshold to switch from one to the other. I checked the wind direction, phases of the moon, read my tea leaves and goat entrails and came up with:

(defn price
  [books]
  (if (< (count books) 10)
    (best-price books)
    (fast-price books)))

Job done! Well, sort-of. This solution is fast and finds a price, but it does not find the best price for the final edge case.

Actually passing tests for edge cases

My fast solution eagerly tries to build as big partitions as possible from the remaining books. So if you have the following: 1 1 2 2 3 3 4 5 it will price that as:

$£ 8 \times 5 \times (1 - 0.25) + £ 8 \times 3 \times (1 - 0.9) = £ 51.60$

However, it's actually cheaper to price that as two partitions of four books:

$2 \times £ 8 \times 4 \times (1 - 0.2) = £ 51.20$

After a bit of trial-and-error in the Clojure REPL I felt confident that this was the only such edge case, and decided to just build my solution around that. I deleted most of the code I had, along the way micro-optimised⁴ fast price lookup for a unit of books. This map directly gave the price for a stack of 1-5 unique books, having the discount already added to it.

(def fast-price-lookup
  "Pre-calculated prices (with discount)
  for a stack of up to 5 distinct books."
  {1 book-price
   2 (* 2 book-price 95/100)
   3 (* 3 book-price 9/10)
   4 (* 4 book-price 8/10)
   5 (* 5 book-price 75/100)})

I then did a variety of my previous fast-price function. I separated all the books into piles of distinct books, then built the biggest sets of unique books I could (by taking one from each pile) and priced them as a unit unless I reached a state that I recognised I could more beneficially price as $4 + 4$ than $5 + 3$ books. Here's the code for that:

(defn price
  [books]

  ;; Separate the books into piles of
  ;; individual books
  (loop [book-piles (->> books
                         frequencies
                         vals
                         sort)
         total 0]

    ;; Any more piles of books left?
    (if (seq book-piles)

      ;; Do we hit the special case
      ;; where two four-book stacks are
      ;; cheaper than two stacks of
      ;; three and five unique books
      ;; each?
      (if (= '(1 1 2 2 2) book-piles)

        ;; Return current total plus the
        ;; cost of two stacks of four
        ;; unique books.
        (+ total (* 2 (fast-price-lookup 4)))

        ;; Take one book from each
        ;; remaining pile & add the cost
        ;; of this stack of books to the
        ;; running total.
        (recur (->> book-piles
                    (map dec)
                    (remove zero?))
               (+ total
                  (fast-price-lookup
                   (count book-piles)))))

      ;; No more piles of books left;
      ;; return total.
      total)))

There's a lot to like about this. It's dirt simple, and fast. I can price millions of books with this, no problems.

I was very, very pleased with myself until Michelle, a colleague, pointed out that it is highly sensitive to changes in the discount amounts. She even went as far as calling it cheating which made me feel like taking a few deep breaths into a paper bag. But she is right! Change the discount amount given for four books and the pricing function may start to exhibit weird behaviour.

Towards a moderately fast, correct & robust solution

Michelle tried to show me how to solve the problem using Maths but kept on being distracted from reaching a solution by my questions about her notation⁵. We did manage to satisfy ourselves that given a huge amount of books we can break the problem into smaller chunks, of size at most $N \times N$ where $N$ is the biggest number of books we offer a lump discount for. The problem is that in our case $N = 5$ and $5 \times 5 = 25$ is more books than we have in the edge case that we already failed to handle.

This was as far as I got at this work trip, but I've been thinking about the problem (too much) since. Is it possible to make a solution that is correct, fast enough to handle all the provided edge cases, and robust against changes in the discounts? It feels like it should be.

I think there will still be a combinatorial element to the solution. My hope is to add some domain-informed restrictions such that we can pass all the suggested test cases in a reasonable time span. I believe it should be possible. I can think of some restrictions:

Don't generate partitions with more than N items, where N is max number of books we offer a discount for (i.e. 5)
Don't generate partitions with duplicates

One approach I thought of was to generate a set of different shapes of sets that fit those constraints, and try to fit the books we have into these sets.

Experimenting in a Clojure REPL

I still think that combo/partitions can help me find all the different shapes or partitions, with a bit of creative management of its input, so let's swap to a REPL and try just that.

potter.core> (combo/partitions [1 1 1])
;; => (([1 1 1]) ([1 1] [1]) ([1] [1] [1]))

Great! That looks like just the ticket. Let's print each set of partitions on a separate line, and count all the different solutions for, say, 6 books.

potter.core> (count
              (map prn
                   (combo/partitions
                    (repeat 6 1))))
([1 1 1 1 1 1])
([1 1 1 1 1] [1])
([1 1 1 1] [1 1])
([1 1 1 1] [1] [1])
([1 1 1] [1 1 1])
([1 1 1] [1 1] [1])
([1 1 1] [1] [1] [1])
([1 1] [1 1] [1 1])
([1 1] [1 1] [1] [1])
([1 1] [1] [1] [1] [1])
([1] [1] [1] [1] [1] [1])
;; => 11

Ah, looking good, except we have to get rid of any solutions with subsets of more than N elements. Remove takes care of that:

potter.core> (count
              (map prn
                   (remove #(> (count (first %)) 5)
                           (combo/partitions
                            (repeat 6 1)))))
([1 1 1 1 1] [1])
([1 1 1 1] [1 1])
([1 1 1 1] [1] [1])
([1 1 1] [1 1 1])
([1 1 1] [1 1] [1])
([1 1 1] [1] [1] [1])
([1 1] [1 1] [1 1])
([1 1] [1 1] [1] [1])
([1 1] [1] [1] [1] [1])
([1] [1] [1] [1] [1] [1])
;; => 10

Success! And now I realised that we can rank those patterns by their price, so that when we try to fit our actual set of books we can stop as soon as we find our first match—because any later matches we find must be more expensive. Let's just check that we can call this with a value bigger than 23:

potter.core> (time
              (count
               (remove #(> (count (first %)) 5)
                       (combo/partitions
                        (repeat 25 1)))))
"Elapsed time: 54.559845 msecs"
;; => 377

Alright! I think that will suffice. Now let's get to work on this solution.

Days later

Oh boy. I just came back from a very, very deep rabbit-hole. Rather than trying to detail the process, let's just skip straight to a tour of the result…

A tour of my final solution

Let's start with the basics. We need the clojure.math.combinatorics package, so let's import that. And let's define our book price too. You'll notice that this time I'm back to defining the discounts more simply, because I imagine that's the most frequent changes one would make.

(ns potter.core
  (:require
   [clojure.math.combinatorics :as combo]))

(def book-price 8)

(def discounts
  {2 5/100
   3 10/100
   4 20/100
   5 25/100})

We know we'll price partitions of books, so let's make a function to price each partition, and one to price a collection of partitions. These don't need to be terribly efficient, because we won't be calling them very often.

(defn- price-partition
  "Price a partition of N distinct books."
  [n]
  (let [discount (or (discounts n) 0)
        multiplier (- 1 discount)]
    (* book-price n multiplier)))

(defn- sum-price-partitions
  "Calculate the sum of a sequence of
  book partitions."
  [parts]
  (->> parts
       (map price-partition)
       (reduce +)))

Now we need a way to create all the possible ways to partition our number of books into parts. There is never a point in considering any parts larger than the max number of books we offer a discount for, so let's find that first. partitions returns a nested sequence of integers, where each integer is a count of books.

(defn- max-partition-size
  "Given a map of discounts picks
  the max partition size to consider."
  [discounts]
  (->> discounts keys sort last))

(defn- partitions
  "Produce a sequence of possible
  partitions representing N number of
  books, constrained by a max size for
  each partition."
  [n max-part-size]
  (->> (repeat n 1)
       combo/partitions
       (map #(map count %))
       (remove #(> (first %) max-part-size))))

We don't need to know the actual books to price them (because all books cost the same). Since we now have all the possible partitions of books we can calculate the price of all those partitions, and rank them so that the "best" partitions go first. We now have a sorted list of prices that we will pay, mapped to a set of partitions we have to separate the books in to pay that price. Though, we don't know which of those prices we'll end up paying yet.

(defn- sort-partitions-by-price
  "Zip sequences of prices & partitions
  together, and sort by price so the
  cheapest sequence of partitions comes
  first."
  [prices parts]
  (->> (map vector prices parts)
       (sort-by first)))

I'm going to borrow an element from my previous solution and separate the entire purchase into stacks of distinct books. This is represented as a vector of integers, where each integer is the count of books in that stack. We then need a function to pick a selection (partition) of books from these stacks and return the new stack. We also need a function to find all the different ways to pick N books from M stacks of books. Here are both of those.

(defn- pick-books
  "Pick books from the given stacks
  according to indices given; return
  remaining stacks of books."
  [stacks indices]
  ;; update-in *really* doesn't like seqs,
  ;; hence we ensure stacks is a vector here
  (loop [stacks (vec stacks)
         [x & xs] indices]
    (if-not x
      (remove zero? stacks)
      (recur
       (update-in stacks [x] dec) xs))))


(defn- pick-combinations
  "All the unique ways to pick N
  books from a set of stacks."
  [stacks n]
  (combo/combinations
   (range (count stacks)) n))

We are now arriving at the difficult bit… We need a function to check if it's possible to map our desired purchase of books to a particular set of partitions of books. Or put another way, given a seq of partition sizes can we pick all of them (depleting the set of books) such that each partition contains distinct books? This is essentially a depth-first search.

(defn- picks-completely?
  "Is it possible to pick the given
  partitions from the stacks of books,
  such that all the stacks are used up?"
  [parts stacks]
  (loop [stacks stacks
         potential-picks (pick-combinations
                          stacks (first parts))
         remaining-parts (rest parts)
         backtrack-stack []]

    ;; Have we reached a dead end?
    (if (empty? potential-picks)

      ;; Can we backtrack to try
      ;; a different path?
      (if (empty? backtrack-stack)
        false
        (let [prev (peek backtrack-stack)
              stacks (nth prev 0)
              potential-picks (nth prev 1)
              parts (nth prev 2)]
          (recur stacks
                 (rest potential-picks)
                 parts
                 (pop backtrack-stack))))

      (let [remaining-stacks
            (pick-books stacks (first potential-picks))]
        ;; Have we depleted our stacks of books?
        (if (empty? remaining-stacks)
          true
          (recur remaining-stacks
                 (pick-combinations
                  remaining-stacks
                  (first remaining-parts))
                 (rest remaining-parts)
                 (conj backtrack-stack
                       [stacks
                        potential-picks
                        remaining-parts])))))))

OK, that was hard & took me many hours to get right. (Plus at least one to clean up to a point where I would consider showing it to anyone else.) However, now we have all the pieces and it's relatively easy to put it all together in our public price function:

(defn price
  "Calculates the best price you can get
  for a collection of books, by splitting
  it into different partitions and getting
  the optimal discount achievable."
  [books]
  (if (empty? books)
    0
    (let [n (count books)
          max-part-size (max-partition-size
                         discounts)
          parts (partitions n max-part-size)
          prices (map sum-price-partitions parts)
          price-parts (sort-partitions-by-price
                       prices parts)
          stacks (-> books frequencies vals)]
      (loop [[[price parts] & rest] price-parts]
        (if (picks-completely? parts stacks)
          price
          (recur rest))))))

Basically we're just looping over all our candidate set of partitions, cheapest first, and stopping as soon as we find a partition we can use. The last candidate partition has every book in a partition of its own, which must match, so there's no special cases to handle there.

Potential Improvements

This is fast enough, correct enough, and robust enough that I don't feel like spending any more time on it. However, if I were to make it work for even bigger inputs (it currently takes 16 seconds to price 53 books on my machine, which is rather longer than I'd like) I think the two areas of improvement I would consider are:

Finding a better way to calculate suitable partitions than calculating all partitions and throwing away the unsuitable ones, and without having to create all those intermediate vectors that we end up throwing away.
The pick-combinations function is called over and over again with the same arguments, so it might benefit from memoization.

Epilogue

If you want to get into more details and play with this code yourself you might find it easier to check out my coding-dojo repo rather than piece it together from this blog post.

I hope I'm done with this problem now. It's been praying on my mind for a month, hence I decided to try "writing it out of my system"—and the result is this article. I hope you're happy, whoever you are who came up with this exercise :-)

There are several configurations this discount could apply, e.g. ((1 + 2) (3)), ((1) (2 + 3)), ((1 + 3) (2)).

Transcribed into Clojure from http://codingdojo.org/kata/Potter/

I make this sound easy, but it took me hours which probably would have been better of spent sleeping. (It was the middle of the night, after all.)

⁴

Unnecessarily, no doubt.

⁵

I am for some reason unable to remember how to read mathematical notation so have to re-learn it every time I encounter it.

Table of Contents