Dec 20, 2020

The Lock-Free Gauntlet: Implementing a Concurrent Linked List in C++

So, you think writing a linked list is easy? Try doing it without locks.

I recently set myself a slightly masochistic challenge: implement a thread-safe linked list without using std::mutex. That means diving headfirst into the world of std::atomic, Compare-And-Swap (CAS) loops, and the absolute nightmare that is C++ memory ordering.

Here is the code, the pain, and the glory.

The Atomic Core

The heart of any lock-free structure is the Node. In a standard list, next is just a pointer. In a lock-free list, it’s a battlefield.

template <typename T>
struct Node {
    T data;
    std::atomic<Node<T>*> next;
};

We wrap next in std::atomic because multiple threads will be racing to update it simultaneously. If two threads try to insert a node after Head at the exact same nanosecond, one must succeed and the other must retry. This is where the CAS loop comes in.

The CAS Loop: Spin to Win

The “Compare-And-Swap” (CAS) operation is the primitive that makes lock-free programming possible. In C++, it’s atomic_compare_exchange_weak.

Here is my implementation of push:

void push(T data) {
    auto node = new Node<T>{};
    node->data = data;
    auto next = head.load(std::memory_order_relaxed);

    // Compare and Swap to atomically insert.
    do {
        node->next = next;
    } while (!std::atomic_compare_exchange_weak_explicit(
        &head, 
        &next, 
        node, 
        std::memory_order_release, 
        std::memory_order_relaxed));
}

Breaking It Down

Load the Current Head: We grab a snapshot of head.
Speculative Link: We point our new node->next to this snapshot.
The Gamble: We ask the CPU: “Is head still pointing to what I think it is? If yes, point it to node. If no, tell me what head is now.”
Loop: If we failed (because another thread snuck in and changed head), we update our explicit next variable and try again.

It’s a spinlock, but for a single instruction. Efficient? Yes. Painful to debug? Absolutely.

A Note on Memory Ordering

You’ll notice std::memory_order_release and std::memory_order_relaxed.

Relaxed: “I don’t care about order, just give me the value.” We use this for loading the initial head because if we are wrong, the CAS will fail anyway.
Release: “Make sure all my previous writes (like node->data = data) are visible to the thread that acquires this pointer.”

If you mess this up, a consumer thread might see the new Node pointer but read garbage data from node->data. This is why lock-free programming is reserved for those who enjoy reading the C++ standard at 3 AM.

The Verdict

writing a lock-free linked list is a rite of passage. It forces you to understand the hardware underneath your code. It’s faster than a mutex in highly contended scenarios, but the cognitive overhead is massive.

Is it worth it? For a high-frequency trading system, yes. For your average CRUD app? Just use a std::mutex and go home.

Check out the full source code in my Systems Playground.