Performance Profiling C++ Applications in Production
Practical techniques for identifying bottlenecks in C++ systems using perf, Valgrind, and custom instrumentation - without impacting production throughput.
Share
Production profiling in C++ requires tools that add minimal overhead while providing actionable insights. Here’s the toolkit I rely on.
Using perf for CPU Profiling
perf is the go-to for CPU-bound analysis:
# Record for 30 seconds with call-graph info
perf record -g -p $(pidof myapp) -- sleep 30
# Generate a flame graph
perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg
Memory Profiling with Valgrind
For memory leak detection in staging environments:
#include <memory>
#include <vector>
class ConnectionPool {
public:
void addConnection() {
// Use unique_ptr to prevent leaks
connections_.push_back(
std::make_unique<Connection>(config_)
);
}
size_t activeCount() const {
return connections_.size();
}
private:
Config config_;
std::vector<std::unique_ptr<Connection>> connections_;
};
Custom Instrumentation
For production, build lightweight instrumentation directly into your code:
class ScopedTimer {
public:
explicit ScopedTimer(const std::string& name)
: name_(name), start_(std::chrono::high_resolution_clock::now()) {}
~ScopedTimer() {
auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<
std::chrono::microseconds>(end - start_);
metrics::record(name_, duration.count());
}
private:
std::string name_;
std::chrono::time_point<std::chrono::high_resolution_clock> start_;
};
// Usage
void processRequest(const Request& req) {
ScopedTimer timer("process_request");
// ... hot path code
}
Key principle: Profile with production traffic patterns, not synthetic benchmarks. Micro-benchmarks lie.