Weekly Swift articles, podcasts and tips by John Sundell.

Picking the right data structure in Swift

Published on 01 Sep 2019
Basics article available: Time Complexity

Deciding which data structure to use to represent a given collection of values can often be trickier than it seems. Since each kind of data structure is optimized for a certain number of use cases, finding the right match for each set of data can often have a big impact on how efficient our code ends up becoming.

The Swift standard library ships with three main data structures — Array, Dictionary and Set — that each comes with a different set of optimizations, pros and cons. This week, let’s take a look at some of those characteristics, and also how we sometimes might need to venture outside the realm of the standard library to find the right data structure for our needs.

The linearity of arrays

Array is arguably one of the most commonly used data structures in Swift, and for good reasons. It keeps its elements in sequence, it’s easy to iterate through in a predictable fashion, and it can store any kind of value — from structs, to class instances, to other collections.

For example, here we’re using an array to store a collection of shapes that are placed on a Canvas within a drawing app. Then, when asked to render our canvas into an image, we simply iterate through our array in order to draw each element using a DrawingContext — like this:

struct Canvas {
    var shapes: [Shape]

    func render() -> Image {
        let context = DrawingContext()
        shapes.forEach(context.draw)
        return context.makeImage()
    }
}

When it comes to linearly drawing all of our shapes, like we do above, using an array is a perfect fit. Not only do arrays store their elements in a very efficient manner, they also have a guaranteed order of iteration, which gives us a predictable drawing order without having to do any extra work.

However, just like all other data structures, arrays also have downsides. In our case, we’ll start encountering one such downside when we want to start removing shapes from our canvas. Since array elements are stored by index, we will always need to look up which index that a given shape is associated with before we can remove it:

extension Canvas {
    mutating func remove(_ shape: Shape) {
        guard let index = shapes.firstIndex(of: shape) else {
            return
        }

        shapes.remove(at: index)
    }
}

At first the above code might not seem that problematic, but it’s very likely to become a performance bottleneck for any canvas that contains a large number of shapes — since firstIndex is linear (O(N)) in terms of time complexity.

While we could work around that limitation wherever we’re using our Canvas type — for example by always referring to shapes by index, rather than by value or ID — doing so would make our code both more complex and more fragile, since we’d always need to make sure that our indexes won’t become stale whenever the canvas we’re working with is mutated.

The speed of sets

Instead, let’s see if we can optimize Canvas itself by changing its underlying data structure. Looking at the above problem, one of our initial ideas might be to use a Set instead of an Array. Like we took a look at in “The power of sets in Swift”, one of the big advantages that sets have over arrays is that both inserts and removals can always be performed in constant (O(1)) time, since members are stored by hash value, rather than by index.

Updating Canvas to use a set instead would make it look like this:

struct Canvas {
    var shapes: Set<Shape>

    func render() -> Image {
        let context = DrawingContext()
        shapes.forEach(context.draw)
        return context.makeImage()
    }

    mutating func remove(_ shape: Shape) {
        shapes.remove(shape)
    }
}

Again, the above code might look right, and it even compiles without a hitch. However, while we’ve solved our removal problem, we’ve also lost our stable drawing order — since, unlike arrays, sets do not give us a guaranteed order of iteration — which is a dealbreaker in this situation, as we’d start drawing the user’s shapes in a seemingly random order.

Indexing indexes

Let’s keep experimenting. Next, let’s see if we can optimize Canvas by introducing a Dictionary that lets us look up any shape’s index based on its ID. We’ll start by making our array of shapes private to be able to take control over how elements are inserted — using a new add method — and each time a new shape is added we also add its index to our dictionary:

struct Canvas {
    private var shapes = [Shape]()
    private var indexes = [Shape.ID : Int]()

    func render() -> Image {
        let context = DrawingContext()
        shapes.forEach(context.draw)
        return context.makeImage()
    }

    mutating func add(_ shape: Shape) {
        let index = shapes.count
        indexes[shape.id] = index
        shapes.append(shape)
    }
}

Since we now always know which index that a given shape is stored at, we can quickly perform removals in constant time, just like when we were using a set:

extension Canvas {
    mutating func remove(_ shape: Shape) {
        guard let index = indexes[shape.id] else {
            return
        }

        shapes.remove(at: index)
        indexes[shape.id] = nil
    }
}

However, our new Canvas implementation has a bug that’s quite severe. Each time we remove a shape, we’re in fact invalidating all of the indexes that are higher than the one we just removed — since each of those elements will move one step towards the beginning of the array. While we could fix that problem by adjusting those indexes after each removal, that’d again put us back in O(N) territory, which is what we’ve been trying to avoid since the very beginning.

Our last implementation does have merits though. In general, using a combination of two data structures can be a great idea in situations like this — since we’re often able to use the strengths of one data structure to compensate for the other’s weaknesses, and vice versa.

So let’s try again with another combination, but this time, let’s start by reviewing what our actual requirements are:

Looking at the above requirements, it turns out that while we need a stable iteration order, we don’t actually need indexes — which would make a linked list the perfect fit for our use case.

Linked lists consist of nodes, where each node contains a reference (or link) to the next node in the list, meaning that it can be iterated over in a predictable fashion — without requiring any index updates when an element is removed. However, the Swift standard library does not (yet) contain a linked list type, so if we want to use one — we’ll first have to build it.

Building a linked list

Let’s start by declaring a List struct, which will keep track of the first and last nodes within our list. We’ll make both of those properties read-only outside of our type, to be able to ensure data consistency:

struct List<Value> {
    private(set) var firstNode: Node?
    private(set) var lastNode: Node?
}

Next, let’s create our Node type — which we’ll make a class, since we want to be able to refer to nodes by reference, rather than by value. Our list will be doubly linked, meaning that each node will contain a reference both to its next neighbor, as well as to its previous one. Each node will also store a Value — like this:

extension List {
    class Node {
        var value: Value
        fileprivate(set) weak var previous: Node?
        fileprivate(set) var next: Node?

        init(value: Value) {
            self.value = value
        }
    }
}

The reason we make the above previous property weak is to avoid retain cycles, which would occur if we kept strong references in both directions. To learn more about avoiding retain cycles, check out the “Memory Management” Basics article.

That’s actually all the code that we’ll need in terms of enabling our linked list to store values. But that’s just the first part of the puzzle, just like any other collection, we also want to be able to iterate over it and mutate its contents. Let’s start with iterations which, thanks to Swift’s very protocol-oriented design, can easily be implemented by conforming to Sequence and implementing the makeIterator method:

extension List: Sequence {
    func makeIterator() -> AnyIterator<Value> {
        var node = firstNode

        return AnyIterator {
            // Iterate through all of our nodes by continuously
            // moving to the next one and extract its value:
            let value = node?.value
            node = node?.next
            return value
        }
    }
}

Since our above iteration is so simple, we use the standard library’s AnyIterator to avoid having to implement a custom iterator type — which for more advanced use cases can be done by conforming to IteratorProtocol.

Next, let’s add APIs for mutating our linked list — starting with insertions. We’ll extend List with an append method, which adds a new node for the inserted value, and then returns that node — like this:

extension List {
    @discardableResult
    mutating func append(_ value: Value) -> Node {
        let node = Node(value: value)
        node.previous = lastNode

        lastNode?.next = node
        lastNode = node

        if firstNode == nil {
            firstNode = node
        }

        return node
    }
}

Above we use the @discardableResult attribute, which tells the compiler not to generate any warnings in case the result of calling our method wasn’t used — since we might not always be interested in the actual node that was created.

Since linked lists aren’t based on indexes, but rather on maintaining a chain of values through references, implementing removals is just a matter of updating the removed nodes’s next and previous neighbors to now point at each other instead:

extension List {
    mutating func remove(_ node: Node) {
        node.previous?.next = node.next
        node.next?.previous = node.previous

        // Using "triple-equals" we can compare two class
        // instances by identity, rather than by value:
        if firstNode === node {
            firstNode = node.next
        }

        if lastNode === node {
            lastNode = node.previous
        }
    }
}

With the above in place, the initial version of our List is complete, and we’re ready to take it for a spin. Let’s update Canvas to use our new list — as well as a dictionary that lets us quickly look up which node that corresponds to a given shape ID — as its new combination of data structures:

struct Canvas {
    private var shapes = List<Shape>()
    private var nodes = [Shape.ID : List<Shape>.Node]()

    func render() -> Image {
        let context = DrawingContext()
        shapes.forEach(context.draw)
        return context.makeImage()
    }

    mutating func add(_ shape: Shape) {
        nodes[shape.id] = shapes.append(shape)
    }

    mutating func remove(_ shape: Shape) {
        guard let node = nodes.removeValue(forKey: shape.id) else {
            return
        }

        shapes.remove(node)
    }
}

We now have both fast insertions and removals, as well as a predictable iteration order, without having to add any additional complexity at the call site — pretty cool! And, since we made our new List a completely generic type, we can now reuse it whenever we again need to store index-less values in a linear fashion.

Conclusion

Even though data structures are so fundamental that they can be found in all sorts of programming languages, deciding which one to use in any given situation can still require a fair amount of thinking, testing and experimentation — especially if we want our code to remain efficient as it’s used with a growing set of data.

It’s also highly likely that the right data structure for any given situation might change over time as our requirements evolve, and sometimes using a combination of multiple data structures — rather than just one — might be the way to achieve the performance characteristics that we need.

We’ll continue exploring the world of data structures in upcoming articles — especially by taking a look at those that are still not implemented in the standard library. Like with so many other things, expanding our thinking beyond Swift is sometimes what’s required in order to pick the right data structure within each situation.

Feel free to either find me on Twitter, or email me, if you have any questions, comments or feedback.

Thanks for reading! 🚀