Back to blog
Performance Profiling C++ Applications in Production

Performance Profiling C++ Applications in Production

Practical techniques for identifying bottlenecks in C++ systems using perf, Valgrind, and custom instrumentation - without impacting production throughput.

Share

Production profiling in C++ requires tools that add minimal overhead while providing actionable insights. Here’s the toolkit I rely on.

Using perf for CPU Profiling

perf is the go-to for CPU-bound analysis:

# Record for 30 seconds with call-graph info
perf record -g -p $(pidof myapp) -- sleep 30

# Generate a flame graph
perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg

Memory Profiling with Valgrind

For memory leak detection in staging environments:

#include <memory>
#include <vector>

class ConnectionPool {
public:
    void addConnection() {
        // Use unique_ptr to prevent leaks
        connections_.push_back(
            std::make_unique<Connection>(config_)
        );
    }

    size_t activeCount() const {
        return connections_.size();
    }

private:
    Config config_;
    std::vector<std::unique_ptr<Connection>> connections_;
};

Custom Instrumentation

For production, build lightweight instrumentation directly into your code:

class ScopedTimer {
public:
    explicit ScopedTimer(const std::string& name)
        : name_(name), start_(std::chrono::high_resolution_clock::now()) {}

    ~ScopedTimer() {
        auto end = std::chrono::high_resolution_clock::now();
        auto duration = std::chrono::duration_cast<
            std::chrono::microseconds>(end - start_);
        metrics::record(name_, duration.count());
    }

private:
    std::string name_;
    std::chrono::time_point<std::chrono::high_resolution_clock> start_;
};

// Usage
void processRequest(const Request& req) {
    ScopedTimer timer("process_request");
    // ... hot path code
}

Key principle: Profile with production traffic patterns, not synthetic benchmarks. Micro-benchmarks lie.