(gdb) break *0x972

Debugging, GNU± Linux and WebHosting and ... and ...

Probe effect: but it works with GDB!

Probe effect is unintended alteration in system behavior caused by measuring that system. (according to Wikipedia)

I had such a problem yesterday while debugging a tool based on libevent :

  1. in a multi-threaded application, the main thread receives the quit order,
  2. it transmits the information to the helper thread,
    • the helper thread quits
  3. the main thread wait for the end of the secondary thread execution
  4. and terminates the process.

But that didn't work, because the helper thread never quits, and the main one waits forever.

MWE: Minimal Working Example

Here is a slightly simplified reconstruction of this bug. Don't pay attention to the epoll configuration, I don't know how to use it and didn't learn!

#include <sys/epoll.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define MAXEVENTS 64
#define TIMEOUT -1

void main () {
  struct epoll_event event; struct epoll_event *events;
  int fd = 0; int efd = epoll_create1 (0);

  printf("OH, HAI!\n");
  printf("BTW pid = %d\n", getpid());

  /* I never learnt how to use epoll...  */
  event.events = EPOLLIN | EPOLLET; event.data.fd = fd;
  epoll_ctl (efd, EPOLL_CTL_ADD, fd, &event);
  events = calloc (MAXEVENTS, sizeof event);

  /* Wait forever for an event.  */
  epoll_wait (efd, events, MAXEVENTS, TIMEOUT);

  printf("KTHXBYE\n");
}

then run it:

$ ./a.out
OH, HAI!
BTW pid = 4900

yes, it's blocked, that was expected. Ctrl-C to quit.

The probe effect

Now run it again, and attach GDB to it:

$ gdb --pid $THE_PID
GNU gdb (GDB) 7.9.50.20150505-cvs
...
0x00007fdb33ac7703 in __epoll_wait_nocancel () from /usr/lib/libc.so.6
(gdb) continue
KTHXBYE
[Inferior 1 (process 6957) exited with code 010]

What !? the application now prints KTHXBYE and exits normally ... ! That's the probe effect !

The probe effect ... may not be directly related to the probe!

There is another way to get get the application finish properly, it's with a SIGSTOP/SIGCONT combo:

$ ./a.out
OH, HAI!
BTW pid = 7997
^Z
[1]  + 7997 suspended  ./a.out
$ fg
[1]  + 7997 continued  ./a.out
KTHXBYE

So that probe effect is not actually due to the probe, but to ...

Explanation of this probe effect

... to the fact that GDB (and signals) interrupt system calls !

That's (obviously) in epoll_wait man page:

ERRORS

EINTR : The call was interrupted by a signal handler before either (1) any of the requested events occurred or (2) the timeout expired; see signal(7).

In my libevent application, the thread was blocked in this epoll_wait syscall, which was interrupted by the debugger/signals. Then it checks for the exit flag and terminates as expected.

(It was a configuration problem, not a bug in libevent. The function that should notify the epoll listener was not correctly configured.)

TL;DR

Debuggers (and signals) may and will interrupt system calls. Your code should take that in consideration, that's a normal behavior of system calls. But that may also introduce "probe effects" when you try to study your code with a debugger!