Extending Linux Perf Stat with LD_PRELOAD
For my current work on monitoring, I need to use Linux perf stat. Perf tools read and dislay the harware counters, either for the wall execution of a process, or while attaced to it:
perf stat --pid $(pidof firefox) ^C Performance counter stats for process id '4257': 13.860180 task-clock (msec) 79 context-switches 16 cpu-migrations 11 page-faults 18,397,934 cycles 13,964,242 stalled-cycles-frontend 9,787,703 stalled-cycles-backend 8,320,570 instructions 1,743,632 branches 93,080 branch-misses 1.942768382 seconds time elapsed
That's great, I can attach perf to my process, run it for a while and stop it. But if I want to start, and stop, and start again, I can't. And that's what I want to do, from inside gdb.py.
Signals would be great for that, when ever I send a signal, perf dumps the counter values to stderr, and continues. But that's not implemented ...
Studying perf-stat source code
So let's see in perf-stat source code what we can find for that.
static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg) static void print_aggr(char *prefix)
these look like good candidates, but their symbols are not exported ...
(gdb) print abs_printout No symbol "abs_printout" in current context. (gdb) print print_aggr No symbol "print_aggr" in current context.
Next candidate?
static void process_interval(void) (gdb) p process_interval $3 = {void (void)} 0x441660 <process_interval>
Oh, great, and with no arguments, that's even easier ! Let's try it:
(gdb) call process_interval() 3858. 571793.500217 task-clock 3858 1,880,243 context-switches 3858 115,610 cpu-migrations 3858 8,639,477 page-faults 3858 1,430,093,310,944 cycles 3858 1,037,644,029,921 stalled-cycles-frontend 3858 756,612,594,751 stalled-cycles-backend 3858 820,483,606,606 instructions 3858 176,772,490,245 branches 3858 5,454,270,151 branch-misses
Exactly what we were looking for !
Triggering process_interval() with a signal
Next, we need to be able to trigger this function remotely, and without modifying perf-stat source code. (The easy option would be to patch perf-stat, but then our tool would be harder to distribute).
That can be done with the help of Linux LD_PRELOAD trick: we preload a bit of code inside perf-stat address space, and during the application initialization, we register the signal handler:
void (*process_internal)(void) = (void *) PROCESS_INTERNAL; void my_handler(int signum) { if (signum != SIGUSR2) { return; } printf("Received SIGUSR2!\n"); process_internal(); } void init(void) __attribute__((constructor)); void init(void){ printf("Received init!\n"); signal(SIGUSR2, my_handler); }
Here the (GCC specific) function attribute constructor tells the linker to execute the function when it's loaded. In this constructor, we just register the signal handler, and in the signal handler we call process_internal().
But how do we get the address of process_internal? that's where it is not really really clean: it get it from a lookup of the binary's symbol addresses:
nm -a /usr/bin/perf | grep process_interval 0000000000441660 t process_interval
and we inject it manually in the code:
#define PROCESS_INTERNAL 0x0000000000441660 void (*process_internal)(void) = (void *) PROCESS_INTERNAL;
A better way would be to pass it as an environment variable:
#define PROCESS_INTERNAL ascii_to_long($PROCESS_INTERNAL_ADDR)
and in the shell:
PROCESS_INTERNAL_ADDR=0x$(nm -a /usr/bin/perf | grep process_interval | cut -d" " -f1)
and that works pretty well !
LD_PRELOAD=/path/to/preload.so perf stat -x, -p $PID_TO_PROFILE kill -USR2 $(pidof perf)
Note that SIGUSR1 does't work, certainly because it's used by perf-stat. And don't forget the -x, argument if you want to parse perf-stat output.
Also, keep in mind that this is not a robust implementation (of course :), as there might be concurrency problems, for instance if the signal is received while perf is updating the counters.
If it ain't broken don't fix it; I'll wait for problems before going any further!