(gdb) break *0x972

Debugging, GNU± Linux and WebHosting and ... and ...

Extending Linux Perf Stat with LD_PRELOAD

For my current work on monitoring, I need to use Linux perf stat. Perf tools read and dislay the harware counters, either for the wall execution of a process, or while attaced to it:

perf stat --pid $(pidof firefox)
^C
 Performance counter stats for process id '4257':

     13.860180      task-clock (msec)
            79      context-switches           
            16      cpu-migrations
            11      page-faults             
    18,397,934      cycles
    13,964,242      stalled-cycles-frontend 
     9,787,703      stalled-cycles-backend
     8,320,570      instructions
     1,743,632      branches
        93,080      branch-misses

   1.942768382 seconds time elapsed

That's great, I can attach perf to my process, run it for a while and stop it. But if I want to start, and stop, and start again, I can't. And that's what I want to do, from inside gdb.py.

Signals would be great for that, when ever I send a signal, perf dumps the counter values to stderr, and continues. But that's not implemented ...

Studying perf-stat source code

So let's see in perf-stat source code what we can find for that.

static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg)
static void print_aggr(char *prefix)

these look like good candidates, but their symbols are not exported ...

(gdb) print abs_printout
No symbol "abs_printout" in current context.
(gdb) print print_aggr
No symbol "print_aggr" in current context.

Next candidate?

static void process_interval(void)
(gdb) p process_interval
$3 = {void (void)} 0x441660 <process_interval>

Oh, great, and with no arguments, that's even easier ! Let's try it:

(gdb) call process_interval()
  3858.     571793.500217      task-clock
  3858          1,880,243      context-switches
  3858            115,610      cpu-migrations           
  3858          8,639,477      page-faults              
  3858  1,430,093,310,944      cycles
  3858  1,037,644,029,921      stalled-cycles-frontend
  3858    756,612,594,751      stalled-cycles-backend 
  3858    820,483,606,606      instructions
  3858    176,772,490,245      branches
  3858      5,454,270,151      branch-misses

Exactly what we were looking for !

Triggering process_interval() with a signal

Next, we need to be able to trigger this function remotely, and without modifying perf-stat source code. (The easy option would be to patch perf-stat, but then our tool would be harder to distribute).

That can be done with the help of Linux LD_PRELOAD trick: we preload a bit of code inside perf-stat address space, and during the application initialization, we register the signal handler:

void (*process_internal)(void) = (void *) PROCESS_INTERNAL;

void my_handler(int signum) {
    if (signum != SIGUSR2) {
       return;
    }

    printf("Received SIGUSR2!\n");
    process_internal();
}

void init(void) __attribute__((constructor));

void init(void){
  printf("Received init!\n");
  signal(SIGUSR2, my_handler);
}

Here the (GCC specific) function attribute constructor tells the linker to execute the function when it's loaded. In this constructor, we just register the signal handler, and in the signal handler we call process_internal().

But how do we get the address of process_internal? that's where it is not really really clean: it get it from a lookup of the binary's symbol addresses:

nm -a /usr/bin/perf | grep process_interval
0000000000441660 t process_interval

and we inject it manually in the code:

#define PROCESS_INTERNAL 0x0000000000441660
void (*process_internal)(void) = (void *) PROCESS_INTERNAL;

A better way would be to pass it as an environment variable:

#define PROCESS_INTERNAL ascii_to_long($PROCESS_INTERNAL_ADDR)

and in the shell:

PROCESS_INTERNAL_ADDR=0x$(nm -a /usr/bin/perf | grep process_interval | cut -d" " -f1)

and that works pretty well !

LD_PRELOAD=/path/to/preload.so perf stat -x, -p $PID_TO_PROFILE

kill -USR2 $(pidof perf)

Note that SIGUSR1 does't work, certainly because it's used by perf-stat. And don't forget the -x, argument if you want to parse perf-stat output.

Also, keep in mind that this is not a robust implementation (of course :), as there might be concurrency problems, for instance if the signal is received while perf is updating the counters.

If it ain't broken don't fix it; I'll wait for problems before going any further!