Aug 22, 2022

Deep Dive: High-Performance Logging in C++

Introduction

Logging is boring. Until it crashes your trading engine.

To be honest, in high-frequency trading, logging is usually the first thing to kill your latency. You can’t afford to format strings, allocate memory, or wait for I/O on the critical path. If you do, you might as well be trading via carrier pigeon.

In this post, we’ll peel back the layers of qlog and look at the “vptr trick,” compile-time strings, and why we split messages to fit into cache lines.

The “vptr” Trick: Type Erasure without `std::function`

The core problem in asynchronous logging is simple: How do you pass arbitrary arguments (integers, doubles, strings) to a background thread without formatting them on the critical path?

The standard answer is std::function or std::any. The fast answer is… cheating. Well, legal cheating.

qlog uses a clever combination of templates and C++ inheritance, effectively using the virtual pointer (vptr) as a dynamic dispatcher.

The Mechanism

Capture: When you call log(args...), the arguments are captured into a std::tuple inside a templated class FormattedMessage<Args...>.
In-Place Construction: This FormattedMessage is constructed in-place directly inside the lock-free queue’s buffer. No heap allocation. No malloc.
Inheritance: FormattedMessage<Args...> inherits from a base struct Message.

struct Message {
    virtual void write(std::ostream &os) const = 0;
};

template <typename... Args>
class FormattedMessage : public Message {
    std::tuple<Args...> data;
public:
    FormattedMessage(Args... args) : data(std::move(args)...) {}
    
    void write(std::ostream &os) const override {
        // Unpack tuple and write to stream
        // This code runs on the CONSUMER thread!
    }
};

Why this works

Because FormattedMessage is a derived class, it contains a vptr (virtual table pointer). This vptr points to the specific write() implementation for that specific combination of types.

When the background consumer thread reads from the queue, it gets a pointer to the base Message. It simply calls:

msg->write(file_stream);

The vptr takes care of the rest. We effectively offload the “knowledge” of formatting to the consumer thread via the vptr. The producer thread just copies data and moves on. It’s elegant, standard-compliant, and fast.

Compile-Time Strings (CNTTP)

Strings are slow. Copying them is slower.

To avoid this, qlog enforces that the first token of every log line is a Compile-Time String. This serves as a “key” for the log line (e.g., “ORDER_SENT”).

Using C++20’s Class Non-Type Template Parameters (CNTTP), we pass these strings as types, not values.

// Defined as a type!
struct INFO : StringLiteralToCT<"INF"> {};

When you log, this string is not copied. It is embedded in the type system. The write() function simply prints the static string literal associated with the type. Zero string copying on the critical path.

Cache-Friendly Lock-Free Queue

The queue implementation is where things get obsessive.

1. Splitting Large Messages

A naive implementation might reserve 256 bytes for every message. If a message is small, you waste memory. If it’s large, you crash.

qlog takes a different approach: Message Splitting. The queue slot size is fixed (e.g., 64 bytes) to match a typical cache line.

If a message fits in 64 bytes, it takes one slot.
If it’s larger, the metaprogramming machinery (msgtool) automatically splits the arguments into multiple FormattedMessage chunks.

This ensures that we never waste bandwidth on the ring buffer and every access is cache-line aligned.

2. Multi-Queue Single-Consumer (MPSC)

For thread isolation, qlog uses a Multi-Queue Single-Consumer architecture.

Producers: Each critical thread writes to its own dedicated lock-free queue. No contention.
Consumer: A single background thread polls all queues.

3. Thread Pinning

For this to work, the consumer thread must not be starved. You must use taskset or pthread_setaffinity_np to pin the logger thread to a dedicated core. If you don’t, the OS scheduler will eventually ruin your day.

Safety: Synchronous Fallback

What happens if the queue is full?

In HFT, losing a log (like an execution report) is unacceptable. But blocking the trading thread is also unacceptable.

qlog implements a Safety Policy. If the lock-free queue is full (which implies the consumer is lagging or you are logging too much), the logger switches strategy:

Alert: It logs an [ALOG_ERR] Buffer Overflow error.
Fallback: It bypasses the queue and writes synchronously to a backup logger (usually a direct file write).

This stalls the critical thread (latency penalty), but guarantees data integrity. It’s a deliberate trade-off: Correctness > Latency in error scenarios.

Conclusion

qlog demonstrates that modern C++ allows us to build abstractions that are both zero-cost and type-safe.

By combining the vptr trick, CNTTP, and a cache-aware split-message queue, we achieve a logging engine that is fast enough for the most demanding trading systems. It’s not perfect, but it’s a hell of a lot better than printf.