At the end of May our remote-first company met up for one of our thrice-yearly week-long meetups. One of the objectives is to socialise and do group activities that build psychological safety so that we feel safe to take risks and feel vulnerable in front of each other. This means we can communicate more confidently & efficiently when we are back in our remote offices.
This time we met up in Majorca1 and sessions included ice-breakers (people-bingo; building a tower with marshmallow, tape and spaghetti; etc), building a Technology Radar, Coding Dojo, How To Master Git, Biomechanics For Desk People, and lightning talks—to mention a few. (We also celebrated a product launch and—yes—managed to find time for some bathing, paddling—and even a couple morning runs.)
It is the Coding Dojo I want to talk about in this post. Or; not actually the session itself, but my efforts to solve one of the problems featured in it. It was briefly stated but surprisingly deep and I have found myself thinking about it a lot since.
Table of Contents
A Warning
If you are considering doing this Coding Dojo yourself, you may want to postpone reading this until you do because it provides not one but two different solutions that passes all the suggested test cases.
The Problem
(Paraphrased by yours truly; the original is over at codingdojo.)
- There is a bookshop that has only five distinct books; Let's call them 1–5
- The shop has unlimited quantities of each book
- Each book costs £8
- You can get a discount for buying several distinct books as a unit
- There is no discount for buying two (or more) copies of the same book as a unit
- A purchase can contain multiple units, thus you can get multiple discounts for a single purchase
These are the discounts:
distinct books | discount % |
---|---|
2 distinct books | 5 |
3 distinct books | 10 |
4 distinct books | 20 |
5 distinct books | 25 |
Your mission: write a function to calculate the price of a purchase, such that the customer always pays the minimum available price.
Example: Buying the books 1, 2 & 3 can be put through the till in several different ways:
#1 | Individually (no discounts) | |
#2 | Discount for two of the books1 | |
#3 | Discount for three books |
In this case, your function should pick option #3 as it is the cheapest.
Towards a (simplistic) solution
I immediately thought that this was a combinatorial problem and scoured
Clojure's collections library for suitable combinatorial operators. I
didn't find what I wanted in the standard library, but
clojure.math.combinatorics
appeared to have what I needed.
clojure.math.combinatorics/partitions
return all the different ways to
partition the input into subsets, which sounds like it might work. I
felt I was ready to start coding.
Let's start easy, with an import statement for
clojure.math.combinatorics
, a def
for our book price, and a map of
available discounts:
(ns potter.core (:require [clojure.math.combinatorics :as combo])) (def book-price 8) (def discount {2 0.95 3 0.9 4 0.8 5 0.75})
Then let's try to price a single partition. Because combo/partitions
returns all possible partitions, some have the same book twice in it.
This makes no sense with our discount structure. However, we can just
price superfluous books as if they were bought on their own, and the
price attempts should be discarded by more optimal sets of partitions.
(defn price-partition [books] (let [unique (count (distinct books)) mult (or (discount unique) 1)] (* book-price (+ (* unique mult) (* (- (count books) unique))))))
We're ready to price a sequence of partitions, each with their own discount applied, to make up a whole purchase.
(defn price-partition-seq [partitions] (->> partitions (map price-partition) (reduce +)))
Now, all we need to do is make our top-level price function that takes our input and returns the best price we can give. It is a brute force solution that takes the input and first calculates all possible ways to partition that into subsets, then calculates the price for each of those subsets, sorts the list of prices and grabs the lowest one. Simples!
(defn price [books] (->> books combo/partitions (map price-partition-seq) sort first))
Looks great! Let's try it out against some of the provided test cases2:
(testing "no discounts" (are [total books] (= total (price books)) 0 [] 8 [1] 16 [2 2] 24 [3 3 3] 32 [4 4 4 4] 40 [5 5 5 5 5]))
Alright!
(testing "Simple discounts" (are [total books] (= total (price books)) (* 16 95/100) [1 2] (* 16 95/100) [1 3] (* 16 95/100) [1 4] (* 16 95/100) [1 5] (* 8 3 9/10) [1 3 5] (* 8 4 8/10) [1 2 3 5] (* 8 5 75/100) [1 2 3 4 5]))
Winning!
(testing "Multiple discounts" (are [total books] (= total (price books)) (+ 8 (* 2 8 95/100)) [1 1 2] (* 2 (* 2 8 95/100)) [1 1 2 2] (+ (* 8 4 8/10) (* 8 2 95/100)) [1 1 2 3 3 4] (+ 8 (* 5 8 75/100)) [1 2 2 3 4 5]))
Still got it!
(testing "Edge cases" (are [total books] (= total (price books)) (* 2 (* 8 4 8/10)) [1 1 2 2 3 3 4 5] (+ (* 3 (* 8 5 75/100)) (* 2 8 4 8/10)) [1 1 1 1 1 2 2 2 2 2 3 3 3 3 4 4 4 4 4 5 5 5 5]))
BOOOOOO! Failed!
That last edge case has 23 books in it, and finding all the possible partitions is a function whose time consumption grows rapidly with the number of books. For 13 to 15 books the time taken for this function roughly triples for each book added, so 23 books will take rather a long time. Witness:
potter.core> (time (count (combo/partitions [1 1 1 2 2 2 3 3 3 4 4 5 5]))) "Elapsed time: 1487.052629 msecs" ;; => 200549 potter.core> (time (count (combo/partitions [1 1 1 2 2 2 3 3 3 4 4 4 5 5]))) "Elapsed time: 4089.57496 msecs" ;; => 573003 potter.core> (time (count (combo/partitions [1 1 1 2 2 2 3 3 3 4 4 4 5 5 5]))) "Elapsed time: 12744.500346 msecs" ;; => 1688360
It's time to try a different approach.
Finding a solution at all for large purchases
Although the simple solution works for small numbers of books, it is
impractical for larger stacks of books. I felt that it might be
possible to break the problem down somewhat, or at least approximate
the solution for larger problems. I renamed my price
function to
best-price
and added a new fast-price
function3:
(defn fast-price [books] (loop [counts (->> books frequencies vals sort) total 0] (if (seq counts) (let [c (first counts) n (count counts) p (* c (price-partition (take n (iterate inc 1))))] (recur (->> counts (map #(- % c)) (remove zero?)) (+ total p))) total)))
Then it was a case of finding a suitable threshold to switch from one to the other. I checked the wind direction, phases of the moon, read my tea leaves and goat entrails and came up with:
(defn price [books] (if (< (count books) 10) (best-price books) (fast-price books)))
Job done! Well, sort-of. This solution is fast and finds a price, but it does not find the best price for the final edge case.
Actually passing tests for edge cases
My fast solution eagerly tries to build as big partitions as possible from the remaining books. So if you have the following: 1 1 2 2 3 3 4 5 it will price that as:
However, it's actually cheaper to price that as two partitions of four books:
After a bit of trial-and-error in the Clojure REPL I felt confident that this was the only such edge case, and decided to just build my solution around that. I deleted most of the code I had, along the way micro-optimised4 fast price lookup for a unit of books. This map directly gave the price for a stack of 1-5 unique books, having the discount already added to it.
(def fast-price-lookup "Pre-calculated prices (with discount) for a stack of up to 5 distinct books." {1 book-price 2 (* 2 book-price 95/100) 3 (* 3 book-price 9/10) 4 (* 4 book-price 8/10) 5 (* 5 book-price 75/100)})
I then did a variety of my previous fast-price
function. I separated
all the books into piles of distinct books, then built the biggest
sets of unique books I could (by taking one from each pile) and priced
them as a unit unless I reached a state that I recognised I could
more beneficially price as
(defn price [books] ;; Separate the books into piles of ;; individual books (loop [book-piles (->> books frequencies vals sort) total 0] ;; Any more piles of books left? (if (seq book-piles) ;; Do we hit the special case ;; where two four-book stacks are ;; cheaper than two stacks of ;; three and five unique books ;; each? (if (= '(1 1 2 2 2) book-piles) ;; Return current total plus the ;; cost of two stacks of four ;; unique books. (+ total (* 2 (fast-price-lookup 4))) ;; Take one book from each ;; remaining pile & add the cost ;; of this stack of books to the ;; running total. (recur (->> book-piles (map dec) (remove zero?)) (+ total (fast-price-lookup (count book-piles))))) ;; No more piles of books left; ;; return total. total)))
There's a lot to like about this. It's dirt simple, and fast. I can price millions of books with this, no problems.
I was very, very pleased with myself until Michelle, a colleague, pointed out that it is highly sensitive to changes in the discount amounts. She even went as far as calling it cheating which made me feel like taking a few deep breaths into a paper bag. But she is right! Change the discount amount given for four books and the pricing function may start to exhibit weird behaviour.
Towards a moderately fast, correct & robust solution
Michelle tried to show me how to solve the problem using Maths but
kept on being distracted from reaching a solution by my questions
about her notation5. We did manage to satisfy ourselves that
given a huge amount of books we can break the problem into smaller
chunks, of size at most
This was as far as I got at this work trip, but I've been thinking about the problem (too much) since. Is it possible to make a solution that is correct, fast enough to handle all the provided edge cases, and robust against changes in the discounts? It feels like it should be.
I think there will still be a combinatorial element to the solution. My hope is to add some domain-informed restrictions such that we can pass all the suggested test cases in a reasonable time span. I believe it should be possible. I can think of some restrictions:
- Don't generate partitions with more than N items, where N is max number of books we offer a discount for (i.e. 5)
- Don't generate partitions with duplicates
One approach I thought of was to generate a set of different shapes of sets that fit those constraints, and try to fit the books we have into these sets.
Experimenting in a Clojure REPL
I still think that combo/partitions
can help me find all the different
shapes or partitions, with a bit of creative management of its input,
so let's swap to a REPL and try just that.
potter.core> (combo/partitions [1 1 1]) ;; => (([1 1 1]) ([1 1] [1]) ([1] [1] [1]))
Great! That looks like just the ticket. Let's print each set of partitions on a separate line, and count all the different solutions for, say, 6 books.
potter.core> (count (map prn (combo/partitions (repeat 6 1)))) ([1 1 1 1 1 1]) ([1 1 1 1 1] [1]) ([1 1 1 1] [1 1]) ([1 1 1 1] [1] [1]) ([1 1 1] [1 1 1]) ([1 1 1] [1 1] [1]) ([1 1 1] [1] [1] [1]) ([1 1] [1 1] [1 1]) ([1 1] [1 1] [1] [1]) ([1 1] [1] [1] [1] [1]) ([1] [1] [1] [1] [1] [1]) ;; => 11
Ah, looking good, except we have to get rid of any solutions with subsets of more than N elements. Remove takes care of that:
potter.core> (count (map prn (remove #(> (count (first %)) 5) (combo/partitions (repeat 6 1))))) ([1 1 1 1 1] [1]) ([1 1 1 1] [1 1]) ([1 1 1 1] [1] [1]) ([1 1 1] [1 1 1]) ([1 1 1] [1 1] [1]) ([1 1 1] [1] [1] [1]) ([1 1] [1 1] [1 1]) ([1 1] [1 1] [1] [1]) ([1 1] [1] [1] [1] [1]) ([1] [1] [1] [1] [1] [1]) ;; => 10
Success! And now I realised that we can rank those patterns by their price, so that when we try to fit our actual set of books we can stop as soon as we find our first match—because any later matches we find must be more expensive. Let's just check that we can call this with a value bigger than 23:
potter.core> (time (count (remove #(> (count (first %)) 5) (combo/partitions (repeat 25 1))))) "Elapsed time: 54.559845 msecs" ;; => 377
Alright! I think that will suffice. Now let's get to work on this solution.
Days later
Oh boy. I just came back from a very, very deep rabbit-hole. Rather than trying to detail the process, let's just skip straight to a tour of the result…
A tour of my final solution
Let's start with the basics. We need the clojure.math.combinatorics
package, so let's import that. And let's define our book price too.
You'll notice that this time I'm back to defining the discounts more
simply, because I imagine that's the most frequent changes one would
make.
(ns potter.core (:require [clojure.math.combinatorics :as combo])) (def book-price 8) (def discounts {2 5/100 3 10/100 4 20/100 5 25/100})
We know we'll price partitions of books, so let's make a function to price each partition, and one to price a collection of partitions. These don't need to be terribly efficient, because we won't be calling them very often.
(defn- price-partition "Price a partition of N distinct books." [n] (let [discount (or (discounts n) 0) multiplier (- 1 discount)] (* book-price n multiplier))) (defn- sum-price-partitions "Calculate the sum of a sequence of book partitions." [parts] (->> parts (map price-partition) (reduce +)))
Now we need a way to create all the possible ways to partition our
number of books into parts. There is never a point in considering any
parts larger than the max number of books we offer a discount for, so
let's find that first. partitions
returns a nested sequence of
integers, where each integer is a count of books.
(defn- max-partition-size "Given a map of discounts picks the max partition size to consider." [discounts] (->> discounts keys sort last)) (defn- partitions "Produce a sequence of possible partitions representing N number of books, constrained by a max size for each partition." [n max-part-size] (->> (repeat n 1) combo/partitions (map #(map count %)) (remove #(> (first %) max-part-size))))
We don't need to know the actual books to price them (because all books cost the same). Since we now have all the possible partitions of books we can calculate the price of all those partitions, and rank them so that the "best" partitions go first. We now have a sorted list of prices that we will pay, mapped to a set of partitions we have to separate the books in to pay that price. Though, we don't know which of those prices we'll end up paying yet.
(defn- sort-partitions-by-price "Zip sequences of prices & partitions together, and sort by price so the cheapest sequence of partitions comes first." [prices parts] (->> (map vector prices parts) (sort-by first)))
I'm going to borrow an element from my previous solution and separate the entire purchase into stacks of distinct books. This is represented as a vector of integers, where each integer is the count of books in that stack. We then need a function to pick a selection (partition) of books from these stacks and return the new stack. We also need a function to find all the different ways to pick N books from M stacks of books. Here are both of those.
(defn- pick-books "Pick books from the given stacks according to indices given; return remaining stacks of books." [stacks indices] ;; update-in *really* doesn't like seqs, ;; hence we ensure stacks is a vector here (loop [stacks (vec stacks) [x & xs] indices] (if-not x (remove zero? stacks) (recur (update-in stacks [x] dec) xs)))) (defn- pick-combinations "All the unique ways to pick N books from a set of stacks." [stacks n] (combo/combinations (range (count stacks)) n))
We are now arriving at the difficult bit… We need a function to check if it's possible to map our desired purchase of books to a particular set of partitions of books. Or put another way, given a seq of partition sizes can we pick all of them (depleting the set of books) such that each partition contains distinct books? This is essentially a depth-first search.
(defn- picks-completely? "Is it possible to pick the given partitions from the stacks of books, such that all the stacks are used up?" [parts stacks] (loop [stacks stacks potential-picks (pick-combinations stacks (first parts)) remaining-parts (rest parts) backtrack-stack []] ;; Have we reached a dead end? (if (empty? potential-picks) ;; Can we backtrack to try ;; a different path? (if (empty? backtrack-stack) false (let [prev (peek backtrack-stack) stacks (nth prev 0) potential-picks (nth prev 1) parts (nth prev 2)] (recur stacks (rest potential-picks) parts (pop backtrack-stack)))) (let [remaining-stacks (pick-books stacks (first potential-picks))] ;; Have we depleted our stacks of books? (if (empty? remaining-stacks) true (recur remaining-stacks (pick-combinations remaining-stacks (first remaining-parts)) (rest remaining-parts) (conj backtrack-stack [stacks potential-picks remaining-parts])))))))
OK, that was hard & took me many hours to get right. (Plus at least
one to clean up to a point where I would consider showing it to anyone
else.) However, now we have all the pieces and it's relatively easy to
put it all together in our public price
function:
(defn price "Calculates the best price you can get for a collection of books, by splitting it into different partitions and getting the optimal discount achievable." [books] (if (empty? books) 0 (let [n (count books) max-part-size (max-partition-size discounts) parts (partitions n max-part-size) prices (map sum-price-partitions parts) price-parts (sort-partitions-by-price prices parts) stacks (-> books frequencies vals)] (loop [[[price parts] & rest] price-parts] (if (picks-completely? parts stacks) price (recur rest))))))
Basically we're just looping over all our candidate set of partitions, cheapest first, and stopping as soon as we find a partition we can use. The last candidate partition has every book in a partition of its own, which must match, so there's no special cases to handle there.
Potential Improvements
This is fast enough, correct enough, and robust enough that I don't feel like spending any more time on it. However, if I were to make it work for even bigger inputs (it currently takes 16 seconds to price 53 books on my machine, which is rather longer than I'd like) I think the two areas of improvement I would consider are:
- Finding a better way to calculate suitable partitions than calculating all partitions and throwing away the unsuitable ones, and without having to create all those intermediate vectors that we end up throwing away.
- The
pick-combinations
function is called over and over again with the same arguments, so it might benefit from memoization.
Epilogue
If you want to get into more details and play with this code yourself you might find it easier to check out my coding-dojo repo rather than piece it together from this blog post.
I hope I'm done with this problem now. It's been praying on my mind for a month, hence I decided to try "writing it out of my system"—and the result is this article. I hope you're happy, whoever you are who came up with this exercise :-)
There are several configurations this discount could apply, e.g. ((1 + 2) (3)), ((1) (2 + 3)), ((1 + 3) (2)).
Transcribed into Clojure from http://codingdojo.org/kata/Potter/
I make this sound easy, but it took me hours which probably would have been better of spent sleeping. (It was the middle of the night, after all.)
Unnecessarily, no doubt.
I am for some reason unable to remember how to read mathematical notation so have to re-learn it every time I encounter it.