- sorted_containers on Github
- sorted_containers on RubyGems
- sorted_containers Documentation

I converted Grant Jenks’s Python library Sorted Containers to Ruby. If you are interested in the details of this data structure, I recommend reading his website. His documentation is much more detailed, and I used it as a reference for this implementation.

The library provides a fast sorted array, sorted set, and sorted hash implemented in pure Ruby with no dependencies.

SortedArray, SortedSet, and SortedHash are meant to be drop-in replacements for Array, Set, and Hash but with the extra property that the elements can be accessed in sorted order.

I compare the performance of SortedContainers to SortedSet a C extension red-black tree implementation. You can see the benchmarks below. The performance is comparable for add and delete, and much better for iteration, initialization, and lookup.

Some methods from Array, Set, and Hash have not been implemented in SortedContainers. I want to complete these before version 1.0.0. Feel free to open an issue if you would like a method added or add a pull request if you would like to contribute.

Feedback is welcome. I hope you find this library useful.

Modern computers are good at shifting arrays. For that reason, it’s often faster to keep an array sorted than to use the usual tree-based data structures.

For example, if you have the array `[1,2,4,5]`

and want to insert the element `3`

, you can shift `4, 5`

to the right and insert `3`

in the correct position. This is a `O(n)`

operation, but in practice it’s fast.

You also save memory by not having to store pointers to children nodes, and you benifit from the cache locality of arrays. When you iterate over a sorted array, you are more likely to access elements that are close together in memory.

But we can do better if we have a lot of elements. We can break up the array so fewer elements have to be moved when a new element is inserted. For example, if you have the array `[[1,2,4],[5,6,7]]`

and you want to insert the element `3`

, you can insert `3`

into the first array to get `[[1,2,3,4],[5,6,7]]`

and only the element `4`

has to be shifted.

This often outperforms the more common tree-based data structures like red-black trees with `O(log n)`

insertions, deletions, and lookup. We sacrifice theoretical time complexity for practical performance.

The size of the subarrays is a trade-off. You can modify how big you want to subarrays by setting the `load_factor`

. The default is set to `DEFAULT_LOAD_FACTOR = 1000`

. The subarray is split when its size is `2*load_factor`

. There is no perfect value. The ideal value will depend on your use case and may require some experimentation.

SortedSet and SortedHash are implemented using a SortedArray to keep track of the order, and then also use a standard Set and Hash for quick lookups.

- Items must be comparable. If you try to insert an element that is not comparable, you will get an error.
- The order of the items must not change after they are inserted, or the container will be corrupted.

SortedSet is a C extension red-black tree implementation. I used it as a benchmark to compare the performance of SortedContainers.

Every test was run 5 times and the average was taken.

You can see that SortedContainers has compariable performance for add and delete, and much better performance for iteration, initialization, and include.

I did my best to make the tests as fair as possible, but it’s possible that I made a mistake. It’s also difficult to predict real-world performance from these tests. If you have any suggestions for improvement, please let me know by opening an issue. The code for the benchmarks can be found in the github repository.

- MacBook Pro (16-inch, 2019)
- 2.6 GHz 6-Core Intel Core i7, 16 GB 2667 MHz DDR4
- Ruby 3.2.2
- SortedContainers 0.1.0
- SortedSet 1.0.3

Feedback is welcome. Please open an issue on Github if you have any suggestions or find any bugs.

If you like Python libraries converted to Ruby, you might also like my conversion of `heapq`

to Ruby called heapify

tl;dr: Complete implementation is at the bottom.

Red-Black trees are notorious for being nightmares of pointer manipulation. Instructors will show the theory, but won’t torture their students to implement one. Interviewers will avoid asking about it. They probably couldn’t do it themselves.

You should be vaguely familiar with how you might balance a tree. The details, however, are probably unnecessary for the purposes of an interview. – Gayle McDowell, Cracking the coding interview

If you’re proficient in a functional language, you owe it to yourself to implement a Red-Black tree. You’ll be one of the few people that can code a Red-Black tree on a whiteboard.

It will make you realize why people are so excited about the whole *functional programming* thing.

A Red-Black tree is a balanced binary search tree. Every node is colored red or black. Three rules hold:

- No red node has a red child.
- Every path from the root to an empty node contains the same number of black nodes.
- An empty node is always black.

Draw a tree with these rules. Notice it’s always relatively-balanced. Try to draw one as unbalanced as possible. You won’t get far.

You can prove the maximum depth of a node is at most 2⌊*l**o**g*(*n*+1)⌋

Let’s implement a *s**e**t* with a Red-Black tree. At minimum we’ll need a `member`

function and an `insert`

function.

A tree can be empty, or it can be a node with two subtrees, a color, and an element.

```
data Tree a = Empty -- Empty does not need a color, it's always black.
| T Color (Tree a) a (Tree a)
data Color = R
| B
```

The `member`

function searches for an element. It’s a binary search.

```
member :: Ord a => Tree a -> a -> Bool
T _ left e right) x | x == e = True
member (| x < e = member left x
| x > e = member right x
Empty _ = False member
```

The `insert`

function uses the function `build`

, which is a constructor that makes sure the node is balanced.

```
insert :: Ord a => a -> Tree a -> Tree a
= let T _ a y b = ins s
insert x s in T B a y b
where
@(T color a' y' b')
ins s'| x < y' = build color (ins a') y' b'
| x > y' = build color a' y' (ins b')
| otherwise = s'
Empty = T R Empty x Empty ins
```

There are four cases when `build`

needs to adjust a node. It detects the case when a black parent has a red child with a red child. It shifts the nodes around to fix it. The solution is the same in every case. (Notice the right hand sides of `build`

are the same).

```
build :: Color -> Tree a -> a -> Tree a -> Tree a
B (T R (T R a x b) y c) z d = T R (T B a x b) y (T B c z d)
build B (T R a x (T R b y c)) z d = T R (T B a x b) y (T B c z d)
build B a x (T R (T R b y c) z d) = T R (T B a x b) y (T B c z d)
build B a x (T R b y (T R c z d)) = T R (T B a x b) y (T B c z d)
build = T color left x right build color left x right
```

That’s it. You have a Red-Black tree.

If you want to learn more, read *Purely Functional Data Structures* by Chris Okasaki. I stole most of my implementation from this book. The `build`

diagram is also from the book.

```
module RedBlackSet( empty
, member
, insertwhere
)
data Tree a = Empty
| T Color (Tree a) a (Tree a)
data Color = R
| B
empty :: Ord a => Tree a
= Empty
empty
member :: Ord a => Tree a -> a -> Bool
T _ left e right) x | x == e = True
member (| x < e = member left x
| x > e = member right x
Empty _ = False
member
insert :: Ord a => a -> Tree a -> Tree a
= let T _ a y b = ins s
insert x s in T B a y b
where
@(T color a' y' b')
ins s'| x < y' = build color (ins a') y' b'
| x > y' = build color a' y' (ins b')
| otherwise = s'
Empty = T R Empty x Empty
ins
build :: Color -> Tree a -> a -> Tree a -> Tree a
B (T R (T R a x b) y c) z d = T R (T B a x b) y (T B c z d)
build B (T R a x (T R b y c)) z d = T R (T B a x b) y (T B c z d)
build B a x (T R (T R b y c) z d) = T R (T B a x b) y (T B c z d)
build B a x (T R b y (T R c z d)) = T R (T B a x b) y (T B c z d)
build = T color left x right build color left x right
```

On the front page of Haskell.org, you will see this implementation of the sieve of Eratosthenes:

```
= sieve [2..]
primes where sieve (p:xs) =
: sieve [x | x <- xs, x `mod` p /= 0] p
```

When you see this for the first time it’s amazing. But it’s not the sieve of Eratosthenes.

The problem with the algorithm is the way it crosses-off numbers. In the true sieve of Eratosthenes when we find a prime number, *p*, we start at *p*^{2} and from there we cross-off multiples of *p*. For example, when calculating prime numbers less then 100, when we find 7 we start at 49 and cross-off 56, 63, 70, 77, 84, 91, and 98. That’s 8 operations. When the false algorithm finds 7, it checks every number from 8 to 100, that’s 92 operations!

Melissa E. O’Neill gives us a real functional, lazy, implementation of the algorithm in her paper, ** The Genuine Sieve of Eratosthenes**.

For every number, we check if it’s a multiple of a prime seen so far. We don’t have to check all the primes. We store the primes in a priority queue, indexed on the smallest multiple of it we have seen. We only compare the current number to the smallest index of the queue. If it equals our current number, we know our number must be a composite. We then increment the prime multiple to the next multiple of the prime, and insert it back into the queue. (We also have to adjust the queue because some numbers are inserted more than once. 12 will be in the queue twice because 2^{2} + 2 + 2 + 2 + 2 = 12 and 3^{2} + 3 = 12. Notice 12 is also crossed-off twice in the animation.)

We store multiples of primes as infinite lists. Laziness is key.

My interpretation of the algorithm uses Data.Set as a priority queue, because the functions `insert`

and `findMin`

are *O*(*l**o**g*(*n*)).

```
import qualified Data.Set as PQ
primes :: [Integer]
= 2:sieve [3,5..]
primes where
:xs) = x : sieve' xs (insertprime x xs PQ.empty)
sieve (x
:xs) table
sieve' (x| nextComposite == x = sieve' xs (adjust x table)
| otherwise = x : sieve' xs (insertprime x xs table)
where
= PQ.findMin table
(nextComposite,_)
adjust x table| n == x = adjust x (PQ.insert (n', ns) newPQ)
| otherwise = table
where
Just ((n, n':ns), newPQ) = PQ.minView table
= PQ.insert (p*p, map (*p) xs) insertprime p xs
```

The difference in time it takes each algorithm to calculate the 10,000th prime number on my machine is huge:

**False sieve:**

```
real 0m7.913s
user 0m7.886s
sys 0m0.016s
```

**O’Neill’s algorithm:**

```
real 0m0.248s
user 0m0.241s
sys 0m0.004s
```

The false sieve takes almost **8 seconds**! Compare this to the real sieve which takes about **0.24 seconds**.