Articles, podcasts and news about Swift development, by John Sundell.

The power of sets in Swift

Published on 04 Mar 2018
Discover page available: The Standard Library

Although Set is one of those core data structures that you see in almost every programming language, it sometimes get a bit overlooked as our default choice for storing collections of non-keyed objects tends to be to use an Array.

This week, let's take a look at a few different examples of when using a set can lead to more predictable performance & simpler code, as well as how to use some of Swift's Set type's lesser known - yet very powerful - features. Let's dive in!

Constant time

One of the key characteristics of a Set is that it stores its members based on hash value rather than by index (like Array does). In other words, it sacrifices guaranteed member order to gain constant (O(1)) lookup time. The simplest use case for this is when you're storing a potentially large collection of unordered values, like when keeping track of all the IDs of a user's friends:

struct User {
    var friendIDs = Set<UUID>()
}

By using a set here rather than, for instance, an array - we can avoid any performance bottlenecks that we could easily encounter for users with a large amount of friends. To check whether a given user is friends with another user, we'd use the contains API - which for an array could require each element to be inspected, while always being a quick hash value lookup for a set, regardless of the size of the collection.

Indexing

So sets are great for when you don't care about the order of a collection, but they can also be very useful if we need to optimize certain code paths even when an ordered collection (like an array) is used.

Let's say we're building an app that lets the user rate movies, and we want to include a list that shows all the movies that the user has rated, in order. To be able to do that, we need to store the ratings in a way that preserves the order, like this:

class RatingsManager {
    private typealias Rating = (score: Int, movieID: UUID)

    private var ratings = [Rating]()
}

However, a very common operation in our app is to check whether the user has rated a given movie. We often need to perform this kind of lookup in performance-critical code paths (like when scrolling a table view), so we wouldn't want to rely on an O(n) operation like iterating through the ratings array:

extension RatingsManager {
    func containsRating(for movie: Movie) -> Bool {
        for rating in ratings {
            if rating.movieID == movie.id {
                return true
            }
        }

        return false
    }
}

Even though we can't use a set as our main storage for ratings (since that would destroy the order), we can use one as an index. Indexing is a very common technique when designing databases, as it lets us fast track certain operations based on storing additional metadata as keys. The cool thing is that we can use the same kind of technique to give our containsRating API constant time complexity.

We'll start by storing the ID of each movie that has been rated in a set:

class RatingsManager {
    typealias Rating = (score: Int, movieID: UUID)

    private var ratings = [Rating]()
    private var movieIDs = Set<UUID>()
}

Now, whenever we insert a rating, we also store the ID of the movie it's for in our movieIDs set:

extension RatingsManager {
    func insertRating(_ score: Int, for movie: Movie) {
        ratings.append((score, movie.id))
        movieIDs.insert(movie.id)
    }
}

Which in turn lets us simplify (both in terms of complexity and amount of code!) our API for checking whether a movie has already been rated:

extension RatingsManager {
    func containsRating(for movie: Movie) -> Bool {
        return movieIDs.contains(movie.id)
    }
}

With the above, we kind of have the best of both worlds - ordered data with a fast, constant time API for checking if a given value is a member. 👍

Note: We also need to remember to remove the indexed movie ID whenever we remove a rating. Like always, the best way to assure that this is done is to cover our RatingsManager with unit tests. 😉

Comparing datasets

Swift's Set implementation also offers some really nice & convenient APIs for comparing two datasets. For example, going back to our example of storing the IDs of a user's friends, let's say that we want to be able to check if a given user has any friends in common with another user. The most obvious way to implement this might be to use a for loop, like this:

extension User {
    func hasFriendsInCommon(with otherUser: User) -> Bool {
        for id in friendIDs {
            if otherUser.friendIDs.contains(id) {
                return true
            }
        }

        return false
    }
}

The good news is that we can reduce the above to a single line of code using the isDisjoint API, which checks if a given set doesn't share any members with another set, like this:

extension User {
    func hasFriendsInCommon(with otherUser: User) -> Bool {
        return !friendIDs.isDisjoint(with: otherUser.friendIDs)
    }
}

Similarly, we can also easily check if a given user has all of their friends in common with another user, by checking if that user's friendIDs set is a subset of the other user's set of friend IDs:

extension User {
    func hasAllFriendsInCommon(with otherUser: User) -> Bool {
        return friendIDs.isSubset(of: otherUser.friendIDs)
    }
}

Finally, we can also form an intersection - a set of all common members of two sets - to get a list of all friends that two users have in common, like this:

extension User {
    func idsOfFriendsInCommon(with otherUser: User) -> Set<UUID> {
        return friendIDs.intersection(otherUser.friendIDs)
    }
}

As you can see, the way sets store their members based on uniqueness gives us some really powerful (and fast!) capabilities that we can use in many interesting ways 👍. There's also APIs like superset, union and more. Check out Set's official documentation for a full list.

Guarding using a set

Let's take a look at one final example. Sometimes you want to perform a certain operation only once for a given object (for example, in order to do any form of preloading or preparation), even if that object later ends up being reused. For example, let's say we have a protocol that enables us to define multiple operations for loading some form of content for our app. It requires each operation to define a prepare() method that only gets executed once, and a perform() method for actually performing the operation, like this:

protocol ContentOperation: AnyObject {
    /// Prepare/preload the operation. Only executed once.
    func prepare()

    /// Perform the operation using a content loader.
    func perform(using loader: ContentLoader,
                 then handler: @escaping () -> Void)
}

To perform an operation, it's then added to a ContentManager, like this:

class ContentManager {
    func perform(_ operation: ContentOperation,
                 then handler: @escaping () -> Void) {
        operation.perform(using: loader, then: handler)
    }
}

So how can we easily keep track of when to call the prepare() method on an operation (given that we only want to do it once per instance)? We're going to apply a similar solution as to when we used a set as an index to enable quick lookups, only this time we're going to take advantage of the fact that a set always returns whether or not a value was actually inserted whenever we add a new member to it.

We'll add a set that keeps track of the identifiers of all objects that have been prepared, and then we'll use a guard statement to return in case an ID wasn't actually inserted, like this:

class ContentManager {
    private var preparedOperationIDs = Set<ObjectIdentifier>()

    func perform(_ operation: ContentOperation,
                 then handler: @escaping () -> Void) {
        prepareIfNeeded(operation)
        operation.perform(using: loader, then: handler)
    }

    private func prepareIfNeeded(_ operation: ContentOperation) {
        let id = ObjectIdentifier(operation)

        guard preparedOperationIDs.insert(id).inserted else {
            return
        }

        operation.prepare()
    }
}

Above, we use each operation's ObjectIdentifier (which is a unique identifier for a given instance of any Swift class) to keep track of it. For more information about that, check out "Identifying objects in Swift".

The result is that we can easily guarantee that prepare() will only be called once per instance, without having to add any form of state requirement to the ContentOperation protocol 🎉.

Conclusion

Sets, arrays, dictionaries, and other types of data structures, all have their pros & cons. They make different tradeoffs and are optimized for different use cases. So while Array will probably be the best tool for the job in many situations, sometimes using a Set instead can allow us to easily speed up our code, or make certain kinds of operations a lot simpler.

My personal approach is that whenever the order of a certain set of values isn't important, I start out with a set as the default (rather than always initially using an array). That way I can get fast inserts, removals and membership tests out of the box, without having to worry about performance bottlenecks if my datasets grow.

Another cool thing about sets is that you can also switch on them! Check out "The power of switch statements in Swift" for an example of that.

What do you think? Do you often use sets in your Swift code or is it something you'll start trying out? Let me know on Twitter @johnsundell.

Thanks for reading! 🚀