GDB scheduler locking, function calls and multi-threading

Tuesday, February 07, 2017 - 2 comments

GDB can call functions in the program's address-space. That can be useful in different sitations, for debugging or just informal code testing. Let's say I want to allocate some memory:

(gdb) p/x (char *)malloc(1024)
$3 = 0x60c960

I can do what ever I want with this buffer:

(gdb) set $b = $1
(gdb) set $b[0] = 48
(gdb) set $b[1] = 120
(gdb) set $b[2] = 57
(gdb) set $b[3] = 55
(gdb) set $b[4] = 50
(gdb) set $b[5] = 0
(gdb) p $b
$3 = 0x60c960 "0x972"
(gdb) call puts($b)
0x972

... but that's not the point of this post. What I wanted to highlight here is want happens in a multithreaded execution, if I do something similar:

(gdb) break 350
(gdb) run
[... many threads are created here ...]
[Switching to Thread ... (LWP 20645)]
Breakpoint 1, .... at ...:350
(gdb) p malloc(1024)
[Switching to Thread ... (LWP 20592)]
Breakpoint 1, .... at ...:350
The program stopped in another thread while making a function call from GDB.
Evaluation of the expression containing the function
(malloc) will be abandoned.
When the function is done executing, GDB will silently stop.

The problem here is that another thread hits the breakpoint during the execution of malloc. This is the default behavior of GDB, and it may be surprising when you're not used to it ...

But in addition to being suprising, it is problematic because the return value of malloc is lost! And the same can happend with next, step, finish, etc.

So if you want to force a single-thread behavior (that is, everytime you're sure that there should't be any deadlock during the function call), set GDB's scheduler-locking.

With the scheduler-locking enabled, GDB only let the current thread run. The execution of the other one is blocked in Linux scheduler.

(In practise, this is implemented in ptrace API, to which GDB either passes the process PID or only the thread TID.)

(gdb) set scheduler-locking on
(gdb) p malloc(1024)
...
(gdb) set scheduler-locking off

Don't forget to turn it back off again afterwards, or your thread will certainly block soon because of someone else's mutex!

In Python, I have a simple wrapper that does that for me:

def set_parameter(name, value):
    class temp:
        def __init__(self):
            self.old = None

        def __enter__(self):
            self.old = gdb.parameter(name)

            if self.old is True: self.old = "on"
            if self.old is False: self.old = "off"
            # otherwise don't change

            gdb.execute("set {} {}".format(name, value), to_string=True)

        def __exit__(self, type, value, traceback):
            gdb.execute("set {} {}".format(name, self.old), to_string=True)
    return temp()

I use it this way:

with set_parameter("scheduler-locking", True):
  buffer = gdb.parse_and_eval("malloc")(1024)

Extending Linux Perf Stat with LD_PRELOAD

Friday, March 25, 2016 - No comments

For my current work on monitoring, I need to use Linux perf stat. Perf tools read and dislay the harware counters, either for the wall execution of a process, or while attaced to it:

perf stat --pid $(pidof firefox)
^C
 Performance counter stats for process id '4257':

     13.860180      task-clock (msec)
            79      context-switches           
            16      cpu-migrations
            11      page-faults             
    18,397,934      cycles
    13,964,242      stalled-cycles-frontend 
     9,787,703      stalled-cycles-backend
     8,320,570      instructions
     1,743,632      branches
        93,080      branch-misses

   1.942768382 seconds time elapsed

That's great, I can attach perf to my process, run it for a while and stop it. But if I want to start, and stop, and start again, I can't. And that's what I want to do, from inside gdb.py.

Signals would be great for that, when ever I send a signal, perf dumps the counter values to stderr, and continues. But that's not implemented ...

Studying `perf-stat` source code

So let's see in perf-stat source code what we can find for that.

static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg)
static void print_aggr(char *prefix)

these look like good candidates, but their symbols are not exported ...

(gdb) print abs_printout
No symbol "abs_printout" in current context.
(gdb) print print_aggr
No symbol "print_aggr" in current context.

Next candidate?

static void process_interval(void)
(gdb) p process_interval
$3 = {void (void)} 0x441660 <process_interval>

Oh, great, and with no arguments, that's even easier ! Let's try it:

(gdb) call process_interval()
  3858.     571793.500217      task-clock
  3858          1,880,243      context-switches
  3858            115,610      cpu-migrations           
  3858          8,639,477      page-faults              
  3858  1,430,093,310,944      cycles
  3858  1,037,644,029,921      stalled-cycles-frontend
  3858    756,612,594,751      stalled-cycles-backend 
  3858    820,483,606,606      instructions
  3858    176,772,490,245      branches
  3858      5,454,270,151      branch-misses

Exactly what we were looking for !

Triggering `process_interval()` with a signal

Next, we need to be able to trigger this function remotely, and without modifying perf-stat source code. (The easy option would be to patch perf-stat, but then our tool would be harder to distribute).

That can be done with the help of Linux LD_PRELOAD trick: we preload a bit of code inside perf-stat address space, and during the application initialization, we register the signal handler:

void (*process_internal)(void) = (void *) PROCESS_INTERNAL;

void my_handler(int signum) {
    if (signum != SIGUSR2) {
       return;
    }

    printf("Received SIGUSR2!\n");
    process_internal();
}

void init(void) __attribute__((constructor));

void init(void){
  printf("Received init!\n");
  signal(SIGUSR2, my_handler);
}

Here the (GCC specific) function attribute constructor tells the linker to execute the function when it's loaded. In this constructor, we just register the signal handler, and in the signal handler we call process_internal().

But how do we get the address of process_internal? that's where it is not really really clean: it get it from a lookup of the binary's symbol addresses:

nm -a /usr/bin/perf | grep process_interval
0000000000441660 t process_interval

and we inject it manually in the code:

#define PROCESS_INTERNAL 0x0000000000441660
void (*process_internal)(void) = (void *) PROCESS_INTERNAL;

A better way would be to pass it as an environment variable:

#define PROCESS_INTERNAL ascii_to_long($PROCESS_INTERNAL_ADDR)

and in the shell:

PROCESS_INTERNAL_ADDR=0x$(nm -a /usr/bin/perf | grep process_interval | cut -d" " -f1)

and that works pretty well !

LD_PRELOAD=/path/to/preload.so perf stat -x, -p $PID_TO_PROFILE

kill -USR2 $(pidof perf)

Note that SIGUSR1 does't work, certainly because it's used by perf-stat. And don't forget the -x, argument if you want to parse perf-stat output.

Also, keep in mind that this is not a robust implementation (of course :), as there might be concurrency problems, for instance if the signal is received while perf is updating the counters.

If it ain't broken don't fix it; I'll wait for problems before going any further!

"Unrelated" problems: it only works with my unused variable!

Monday, November 30, 2015 - No comments

A situation that occurred recently to a colleague of mine:

I don't understand what's the problem, I never use that variable, but if I comment it out, my program crashes! If I let it, it runs fine!

Of course, when you're used C programming and know a bit of memory layout, you already know that "it runs fine" is subjective and that there is buffer overflow somewhere in the code.

Nonetheless, I think the situation is interesting to study, just to remember that this can lead to tricky incomprehensible behaviors.

The problem

Consider this small C code:

 #include "stdio.h"

 #define SIZE 4

 //#define DO_NOT_CRASH

 int *i_ptr;

 #ifdef DO_NOT_CRASH
 int *not_used;
 #endif
 int array[SIZE];

 int main() {
   int i;
   i_ptr = &i;

   for (i = 0; i <= SIZE; i++) {
     array[i] = -1;
   }

   printf("*i_ptr   is ...");
   fflush(stdout);
   printf(" %d\n", *i_ptr);
 }

Run it:

gcc test.c -g -O0 && ./a.out \
echo ==============================; \
gcc test.c -g -O0 -DDO_NOT_CRASH && ./a.out

&i_ptr   is     0x7ffeac4708d0
*i_ptr   is ...[1]    8127 segmentation fault (core dumped)  ./a.out
==============================
&i_ptr   is     0x7ffd5056f508
*i_ptr   is ... 5

(I must admit that it was harder than expected to reproduce the bug. I first put all the variable on the stack (contrary to my colleague), but did not manage to have a clean and buggy behavior! Certainly memory alignment constraints that I don't understand well.)

Surprise! (or not)

Surprise, an unused variable can trigger a segmentation fault!

Last year I presented the definitions of a bug, according to Andreas Zeller. In this definition, he makes the distinction between

a defect (an invalid piece of code),
an infection (the execution of this defect, leading to an invalid memory area)
the propagation of that infection (the augmentation of the invalid memory are size)
and the failure, the externally observable error.

Here we've got an illustration of the purpose of these definition: with -DDO_NOT_CRASH, but program doesn't bug, but we now it's bugged. Yep, totally clear :-)

So what we really have is a program with a defect, whose memory space gets infected, but the infection does not propagate enough to lead to a failure.

How to detect it: with Valgrind

 valgrind ./a.out
 ==17545== Memcheck, a memory error detector
 ==17545== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
 ==17545== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
 ==17545== Command: ./a.out
 ==17545== 
 *i_ptr   is ... 5
 ==17545== 
 ==17545== HEAP SUMMARY:
 ==17545==     in use at exit: 0 bytes in 0 blocks
 ==17545==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
 ==17545== 
 ==17545== All heap blocks were freed -- no leaks are possible
 ==17545== 
 ==17545== For counts of detected and suppressed errors, rerun with: -v
 ==17545== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Nop! I expected to see something with valgrind, but apparently it's not illegal enough! (or I missed an option ...?)

How to detect it: with GDB

That that we understand the situation, we know that there is a buffer overflow, but we need to find the infection point!

(gdb) start
(gdb) watch not_used 
Hardware watchpoint 2: not_used
(gdb) cont
Continuing.
Hardware watchpoint 2: not_used

Old value = (int *) 0x0
New value = (int *) 0xffffffff
main () at overflow-long.c:18
18    for (i = 0; i <= SIZE; i++) {
19      array[i] = -1;
(gdb) print i
$1 = 4

There we are, for i=4, array[i] overflows into not_used. Our defect contaminates a memory area that is never read, so it never propagated to a failure.

Unexpected behavior of GDB watchpoint

At the beginning of the execution, the value of not_used is 0. In the overflow of the for loop, I set it to -1, so the watchpoint is triggered.

But in the first code I wrote, I set it to 0, and the watchpoint was ... not triggered. That's a bit unexpected to me, a write is a write, so I wanted the watchpoint to trigger!

So, just to confirm, I tried with rwatch, to set a read watchpoint ... and it worked!

 (gdb) rwatch not_used 
 Hardware read watchpoint 2: not_used
 (gdb) cont
 Continuing.
 Hardware read watchpoint 2: not_used

 Value = (int *) 0x0
 main () at overflow-long.c:18
 18   for (i = 0; i <= SIZE; i++) {
 (gdb)

That's also surprising to me, as my code is not supposed to read anything at this address!

Just to make it stranger, using rwatch with 0 <- -1 and watch with 0 <- 0 (the reverse of what works) doesn't work! (For the record, it's always the mov instruction that triggers my watchpoints).

GDB and Frame-Filters: a bug and a quick fix

Tuesday, November 10, 2015 - No comments

With frame filters and decorators, GDB lets you rewrite the output of the where command. That's quite convenient, except that it doesn't work well in one situation (that I could not clearly understand ...):

(gdb) where no-filters # <--- notice the option here
#0  do_spin () at .../gcc-5.2.0/libgomp/config/linux/wait.h:55
#1  do_wait () at ...gcc-5.2.0//libgomp/config/linux/wait.h:64
#2  gomp_team_barrier_wait_end (...) at .../libgomp/config/linux/bar.c:112
#3  0x00007ffff7bd8966 in GOMP_barrier () at gomp_preload.c:49
#4  0x0000000000400a19 in main._omp_fn.0 () at parallel-demo.c:10
#5  0x00007ffff7bd89e4 in GOMP_parallel_trampoline (...) at gomp_preload.c:62
#6  0x00007ffff79c442e in gomp_thread_start () at .../libgomp/team.c:118
#7  0x00007ffff7bd8ce8 in pthread_create_trampoline () at pthread_preload.c:33
#8  0x00007ffff779f4a4 in start_thread () from /usr/lib/libpthread.so.0
#9  0x00007ffff74dd13d in clone () from /usr/lib/libc.so.6

becomes:

(gdb) where
#0  gomp_team_barrier_wait_end () at .../libgomp/config/linux/wait.h:55
#1  gomp_team_barrier_wait_end () at .../libgomp/config/linux/wait.h:64
#2  #pragma omp barrier () at parallel-demo.c:10
#4  #parallel zone #1 of main () at parallel-demo.c:10

Many frames are gone, that's my cleanup, some function names have been changed, that's my OpenMP work ... but the function name of frame #0 and #1 are inconsistent. It should not read gomp_team_barrier_wait_end but rather do_spin and do_wait, respectively.

I don't know what's special about these functions, they're inlined, but that's not enough to explain and recreate the problem ...

Anyway, I found that the inconsistency boils down to two lines:

(gdb) frame 0
(gdb) pi print(gdb.selected_frame().function())
gomp_team_barrier_wait_end # wrong
(gdb) pi print(gdb.selected_frame().name())    
do_spin # right

So to solve my problem, I add a frame decorator that picks up the frame name instead of its function symbol:

class BugFixFrame(gdb.frames.FrameDecorator):
    def function(self): 
        return self.inferior_frame().name() 

class BugFixFrameFilter:
    def __init__(self):
        self.enabled = True
        self.priority = 99999

    def filter(self, frames):
        for frame in frames:
            yield BugFixFrame(frame)

gdb.frame_filters["Bug fix frame filter"] = BugFixFrameFilter()

and I now have my clean and correct callstack:

(gdb) where                                                          
#0  do_spin (val=0, addr=0x602104) at .../libgomp/config/linux/wait.h:55
#1  do_wait (val=0, addr=0x602104) at .../libgomp/config/linux/wait.h:64
#2  #pragma omp barrier () at parallel-demo.c:10
#4  #parallel zone #1 of main () at parallel-demo.c:10

I've submitted a bug report as PR/19225.

Looking up Source-Code Lines from GDB/Python (and OpenMP complications)

Monday, November 02, 2015 - No comments

In GDB Python bindings, there is currently no direct way to translate a function symbol into its source file and corresponding lines. But that's possible with gdb command-line, and some more jungling:

(gdb) disassemble [function_name|*address]
Dump of assembler code for function .omp_ptask.:
   0x00000000004024a0 <+0>: push   %rbp
   0x00000000004024a1 <+1>: mov    %rsp,%rbp
   0x00000000004024a4 <+4>: sub    $0x20,%rsp
   0x00000000004024a8 <+8>: mov    %edi,-0x8(%rbp)
   0x00000000004024ab <+11>:    mov    %rsi,-0x10(%rbp)
   0x00000000004024af <+15>:    mov    -0x8(%rbp),%edi
   0x00000000004024b2 <+18>:    mov    %edi,-0x14(%rbp)
   0x00000000004024b5 <+21>:    mov    -0x10(%rbp),%rsi
   0x00000000004024b9 <+25>:    mov    (%rsi),%rsi
   0x00000000004024bc <+28>:    mov    (%rsi),%rdi
   0x00000000004024bf <+31>:    callq  0x4009f0 <foo>
=> 0x00000000004024c4 <+36>:    mov    -0x4(%rbp),%eax
   0x00000000004024c7 <+39>:    add    $0x20,%rsp
   0x00000000004024cb <+43>:    pop    %rbp
   0x00000000004024cc <+44>:    retq   
End of assembler dump.

With disassemble, we know (gdb tells us) where a function begins and ends ... in memory. In theory, we just have to parse the second and penultimate lines of gdb.execute("disassemble {addr}"). But in practise, compilers may reorganize (for optimization) the binary instructions, so it's safer to iterate through all of them. Then, gdb.find_pc_line(pc) tells use the source-code line matching that PC. There we are:

 def get_function_fname_and_lines(fct_symb):
     fct_addr = long(fct_symb.value().address)
     disa = gdb.execute("disassemble {}".format(fct_addr), to_string=True)

     filename = fct_symb.symtab.filename
     from_line = fct_symb.line
     to_line = 0
     for disa_line in disa.split("\n"):
         if "Dump of assembler code" in disa_line:
             continue # skip first line
         if "End of assembler dump." in disa_line:
             break # we're at the end
         try:
             # parse the PC value
             # => 0x004009c1 <+32>: jmpq   0x401464 <main._omp_fn.0+2755>
             pc = int(disa_line.replace("=>", "").split()[0], 16)
         except:
             log.warning("Could not parse disassembly line ...")
             log.warning(disa_line)
             continue

         sal = gdb.find_pc_line(pc)
         if not sal:
             continue # hum, nothing known that that PC


         # check for consistency that PC is in the right file
         if not sal.symtab.filename == fct_symb.symtab.filename:
             log.info("not the right file, inlined ?")
             continue

         # if function symbol doesn't specify its line
         if fct_symb.line == 0:
             if from_line == 0 or sal.line < from_line:
                 from_line = sal.line

         # PCs may not be in order
         if sal.line > to_line:
             to_line = sal.line

     return filename, from_line, to_line

which gives:

(gdb) print(get_function_fname_and_lines(gdb.lookup_symbol("main")[0]))
('minimal_omp_threads.c', 26, 76)

OpenMP complications

I wrote that function as part of my work on OpenMP (OMP) debugging. In OMP, compilers do "outlining", that is, the reverse of inlining:

#pragma omp task
    foo1(&i, &j, &k);

becomes with GCC/GOMP:

main._omp_fn.3 (...) {
    foo1 (...);
}

GOMP_task (main._omp_fn.3, ...);

Everything is okay here, my code works well. But with Intel OpenMP and LLVM/Clang, they didn't implement outlining the same way: instead of naming the outlined functions something like <parent>._omp_fn.<id>, they name them ... .omp_microtask.! Thanks guys, now gdb.lookup_symbol(".omp_microtask.") always returns the same symbol (certainly the first one), and so does my source-code lookup function.

We do have the address of the function

(Pdb) print fn
0x402340 <.omp_ptask.>

but gdb.lookup_symbol cannot do the lookup by address ...

So let's get back to GDB/Python documentation and see how we can fix that:

Function: gdb.lookup_symbol (name [, block [, domain]])

This function searches for a symbol by name. The search scope can be restricted to the parameters defined in the optional domain and block arguments.

That block argument looks good (that's more or less the equivalent of a C scope). But where show I get it from?

I remember that [gdb.Frame](https://sourceware.org/gdb/current/onlinedocs/gdb/Frames-In-Python.html#Frames-In-Python) has a block attribute:

(gdb) pi gdb.lookup_symbol(".omp_microtask.", gdb.selected_frame().block())[0]
<gdb.Symbol object at 0x7fc96e0883c8>
(gdb) pi get_function_fname_and_lines(...)
('minimal_omp_threads.c', 38, 39)

but that doesn't work as I wanted (that is, from the task allocator function), because we are in the scope of the task allocator function, which is here equivalent to the global one. The lookup always resolves to the first task ...

So, how to get the right block? Let's get back to the documentation, maybe the block page ...

Function: gdb.block_for_pc (pc)

Return the innermost gdb.Block containing the given pc value. If the block cannot be found for the pc value specified, the function will return None.

Interesting ! Furthermore:

Variable: Block.function

The name of the block represented as a gdb.Symbol. If the block is not named, then this attribute holds None. This attribute is not writable.

For ordinary function blocks, the superblock is the static block. However, you should note that it is possible for a function block to have a superblock that is not the static block – for instance this happens for an inlined function.

Indeed:

(Pdb) gdb.block_for_pc (0x402340).function
<gdb.Symbol object at 0x7f824e346300> (.omp_ptask.)

so the final code for Intel OpenMP looks like that:

fct_addr = ... # "0x402340"
fct_symb = gdb.block_for_pc(int(fct_addr, 16)).function
my_gdb.get_function_fname_and_lines(fct_symb)

and that works well :-)

Debugging with GDB: a real life example

Wednesday, September 09, 2015 - No comments

Since quite a while now, I'm a big fan of GDB, the debugger of the GNU project. I did my Master project (User level DB: a debugging API for user-level thread libraries , PDF) and PhD thesis (Programming-Model Centric Debugging for multicore embedded systems) on the topic of source-level interactive debugging, so I'm getting quite confident in how to use GDB. I find its command-line interface (CLI) easy to use, intuitive, powerful, etc., but I know that's not a commonly shared opinion :-) So I'd like to describe, with this 'real-life' bug, how I use it, and at the same time how I did this debugging investigation.

The Minimal Application

I wrote for this article a minimal working example (MWE) that mimics the behavior of the buggy application. Don't forget that at the time I write this post, I already know where the bug is, so I'll go straight to the point, and skip some of the dead-ends of the initial investigation. That will save us time :-) Also note that I did not write the original code myself, and I don't know anything about libevent. I just got my hand on it, and wanted to get rid of this bug.

You can get the source code from this GIST, with the fix or without (if you want to try to solve it yourself :-). Compile it with:

g++ bug.cpp -std=c++11 -levent -levent_pthreads -lpthread -g -O2

You'll need libevent debugging information if you want to reproduce the investigation.

Its (expected) behavior is quite simple:

Configure and enable libevent.
Create an second thread dispatching the events. We'll call it the dispatcher thread.
Wait WAIT_TIME seconds to mockup the application execution.
Decide it's time to leave and tell the other thread/libevent to quit the event dispatching loop.
Wait for the other thread to report its end of activity, join it and quit.

The Visible Error

Step 5. is the one with the visible error: the main thread loops forever, waiting for the dispatcher thread:

PID is 19425
Setting thread usage a bit late
Sleeping for 2 seconds ...
Running the dispatcher ...
And killing the application.
Waiting for the thread to die ... (try ^z/fg, SIGSTOP/SIGINT)
Waiting for the thread to die ... (try ^z/fg, SIGSTOP/SIGINT)
Waiting for the thread to die ... (try ^z/fg, SIGSTOP/SIGINT)
^C

It won't terminate, so we call it with a ^C.

Understand the Situation: The Downwards Investigation

In the first part of the investigation, we'll try to understand why the application is blocked. We'll dive into the code to figure out what drives the execution into that bug.

Let's run the application with GDB.

$ gdb a.aout
(gdb) run
PID is 24468
Setting thread usage a bit late
[New Thread 0x7ffff6b37700 (LWP 24472)]
Running the dispatcher ...
Sleeping for 2 seconds ...
And killing the application.
Waiting for the thread to die ... (try ^z/fg, SIGSTOP/SIGINT)
Waiting for the thread to die ... (try ^z/fg, SIGSTOP/SIGINT)
...
^C

The execution is now interrupted in the middle of the infinite loop, so let's check the state of the threads.

(gdb) thread 1
$ where n
#0  0x... in nanosleep () from /usr/lib/libpthread.so.0
#1  0x... in sleep_for<long, std::ratio<1l, 1000l> > (...) 
#2  main () at bug.cpp:82

(gdb) frame 2
(gdb) l # for list
81      while (true) {
82        std::this_thread::sleep_for(std::chrono::milliseconds(1000));
83        if (not m_active_thread.load()) {
84          break;
85        }
86        std::cout << "Waiting for the thread to die ... (try ^z/fg, SIGSTOP/SIGINT)" << std::endl;

Right, thread 1 is inside the waiting loop, that was expected.

(gdb) thread 2
(gdb )where 4
#0  0x... in epoll_wait () from /usr/lib/libc.so.6
#1  0x... in epoll_dispatch (...) at epoll.c:407
#2  0x... in event_base_loop (...) at event.c:1633
#3  0x... in server_thread () at bug.cpp:21
(More stack frames follow...)

Hum, thread 2 is still blocked in epoll_wait, that's why it doesn't return from event_base_loop and unlocks the main thread from the waiting loop:

void server_thread(){
  std::cout << "Running the dispatcher ..." << std::endl;
  event_base_dispatch(base); // still in this function
  event_base_free(base);

  m_active_thread.store(false);
  std::cout << "Dead" << std::endl;
}

Clue 1: the dispatcher thread is blocked in epoll_wait system call.

Side remark: I won't go into details into the probe effect that can happen here, check my previous post.

New lead: why the dispatcher thread doesn't quit the dispatching loop?

A quick search on the Internet tells us that event_base_loopbreak should break that loop.

Hypothese to verify: Is event_base_loopbreak actually called?

(gdb) break event_base_loopbreak
(gdb) run
...
Breakpoint 1, event_base_loopbreak (event_base=0x617c50) at event.c:1511

(gdb) up
#1  0x... in main () at bug.cpp:79
79      event_base_loopbreak(base);

Yes, at bug.cpp:79.

Hypothese to verify: Is event_base_loopbreak action actually done?

int
event_base_loopbreak(struct event_base *event_base)
{
  int r = 0;
  if (event_base == NULL)
    return (-1);

  EVBASE_ACQUIRE_LOCK(event_base, th_base_lock);
  event_base->event_break = 1;

  if (EVBASE_NEED_NOTIFY(event_base)) {
    r = evthread_notify_base(event_base);
  } else {
    r = (0);
  }
  EVBASE_RELEASE_LOCK(event_base, th_base_lock);
  return r;
}

Is base not NULL? (otherwise we'll return too quickly)

(gdb) p base # for print
$1 = (event_base *) 0x617c50

(gdb) p *base
$2 = {
  evsel = 0x7ffff7dd9140 <epollops>, 
  evbase = 0x617ee0, 
  changelist = {
    changes = 0x0, 
    n_changes = 0, 
    changes_size = 0
  }, 
  evsigsel = 0x7ffff7dd9180 <evsigops>, 
  sig = {
    ev_signal = {
...

Yes, and its content seem right (base->evsel is epollops, which makes sense, so it's not a dangling pointer).

So the event_break flag will be set:

(gdb) n # for next
1513        if (event_base == NULL)
(gdb) n
1516        EVBASE_ACQUIRE_LOCK(event_base, th_base_lock);
(gdb) n
1519        if (EVBASE_NEED_NOTIFY(event_base)) {
(gdb) n
1517        event_base->event_break = 1;

Wait, what!? event_break = 1 should be before NEED_NOTIFY! Don't pay attention to that, it's because of compiler optimizations. Recompile libevent with -O0 flag or ignore it!

(gdb) n
1519        if (EVBASE_NEED_NOTIFY(event_base)) {

This one is the real test ;-)

Now let's make sure that it's the right flag we set. With a quick search in event.c, we found:

int
event_base_got_break(struct event_base *event_base)
{
    int res;
    EVBASE_ACQUIRE_LOCK(event_base, th_base_lock);
    res = event_base->event_break;
    EVBASE_RELEASE_LOCK(event_base, th_base_lock);
    return res;
}

That looks good. Is it ever called?

(gdb) start
(gdb) b event_base_got_break
Breakpoint 2 at 0x7ffff7ba05a0: file event.c, line 1530.

(gdb) cont
Running the dispatcher ...
Sleeping for 2 seconds ...
And killing the application.
Waiting for the thread to die ... (try ^z/fg, SIGSTOP/SIGINT)
Waiting for the thread to die ... (try ^z/fg, SIGSTOP/SIGINT)
Waiting for the thread to die ... (try ^z/fg, SIGSTOP/SIGINT)
^C
Program received signal SIGINT, Interrupt.
0x00007ffff778465d in nanosleep () from /usr/lib/libpthread.so.0

Not before the infinite loop apparently,

(gdb) cont
Dead
[Thread 0x7ffff6b39700 (LWP 20947) exited]
Bye everybody.
[Inferior 1 (process 20852) exited normally]

... and not after either. (That's the probe effect we see here). So that's certainly a dead-end.

That could also be the reason of the bug, but the fact that the debugger interrupt makes the application finish properly tells me to let that idea aside.

Next candidate?

int
event_base_loop(struct event_base *base, int flags)
{
    ...
    base->event_gotterm = base->event_break = 0;
    ...
    while (!done) {
        if (base->event_break) {
            break;
        }
        ...
        res = evsel->dispatch(base, tv_p);
        ...
    }
}

(gdb) p evsel->dispatch
$1 = (int (*)(struct event_base *, struct timeval *)) 0x7ffff7bb7d80 <epoll_dispatch>

Yes, that seems to be the right flag: it breaks the loop when set, and just a bit after, the epoll_dispatch function is called (that's the one that blocks forever on epoll_wait).

New lead: How is epoll supposed to quit its polling loop?

(gdb) thread 2
Thread ID 2 not known.

Damn, it's gone, that's the probe effect ...

(gdb) run # that restarts the execution from the beginning
...
Sleeping for 2 seconds ...
^C
Program received signal SIGINT, Interrupt.
0x... in nanosleep () from /usr/lib/libpthread.so.0

(gdb) thread 2
[Switching to thread 2 (Thread 0x7ffff6b39700 (LWP 3962))]
#0  0x... in epoll_wait () from /usr/lib/libc.so.6

(gdb) where 3
#0  0x... in epoll_wait () at /usr/lib/libc.so.6
#1  0x... in epoll_dispatch (base=0x617c50, tv=<optimized out>) at epoll.c:407
#2  0x... in event_base_loop (base=0x617c50, flags=0) at event.c:1629
#3  0x... in server_thread() () at bug.cpp:21

(gdb) up
#1  0x... in epoll_dispatch (base=0x617c50, tv=<optimized out>) at epoll.c:407
407        res = epoll_wait(epollop->epfd, events, epollop->nevents, timeout);

(gdb) p timeout
$1 = <optimized out>

Nope, we can't get the timeout value ... Let's try to see how it's computed:

(gdb) l 397
391     long timeout = -1;
392
393        if (tv != NULL) {
394            timeout = evutil_tv_to_msec(tv);
395            if (timeout < 0 || timeout > MAX_EPOLL_TIMEOUT_MSEC) {
396                /* Linux kernels can wait forever if the timeout is
397                 * too big; see comment on MAX_EPOLL_TIMEOUT_MSEC. */
398                timeout = MAX_EPOLL_TIMEOUT_MSEC;
399            }
400        }

(gdb) p tv
$2 = <optimized out>

Nope again, but let's try to outsmart it!

(gdb) up
#2  0x00007ffff7ba2a0d in event_base_loop (base=0x617c50, flags=0) at event.c:1629
1629            res = evsel->dispatch(base, tv_p);

(gdb) p tv_p
$3 = (struct timeval *) 0x0

Soon enough! So timeout remains -1. And that's a valid value for epoll_wait:

The timeout argument specifies the minimum number of milliseconds that epoll_wait() will block. [...] Specifying a timeout of -1 causes epoll_wait() to block indefinitely [...].

So epoll_wait won't return until it receives something from its event listeners ...

Clue 2: The dispatcher thread waits for a notification.

We saw something like that earlier on, where was that ... Oh, yes, EVBASE_NEED_NOTIFY(event_base) inside event_base_loopbreak. But the name looks like a preprocessor macro ... Indeed:

(gdb) p EVBASE_NEED_NOTIFY
No symbol "EVBASE_NEED_NOTIFY" in current context.

So let's get back to the source:

$ cd $PATH_TO_LIBEVENT_SRC
$ grep "define EVBASE_NEED_NOTIFY" . --after-context=2 --include='*.h' -r
./evthread-internal.h:#define EVBASE_NEED_NOTIFY(base)             \
./evthread-internal.h-    (_evthread_id_fn != NULL &&             \
./evthread-internal.h-        (base)->running_loop &&             \
--
./evthread-internal.h:#define EVBASE_NEED_NOTIFY(base)             \
./evthread-internal.h-    ((base)->running_loop &&             \
./evthread-internal.h-        ((base)->th_owner_id != _evthreadimpl_get_id()))
--
./evthread-internal.h:#define EVBASE_NEED_NOTIFY(base) 0
./evthread-internal.h-#define EVBASE_ACQUIRE_LOCK(base, lock) _EVUTIL_NIL_STMT
./evthread-internal.h-#define EVBASE_RELEASE_LOCK(base, lock) _EVUTIL_NIL_STMT

According to the macro names, it seems to be the first one. If you can recompile the code, you can add a #error to ensure it:

#error "Compilation will crash if we pass here."
#define EVBASE_NEED_NOTIFY(base) ...

If you can recompile the code, while you're at it, you can do a bit of printf debugging (that's what I did :-/) to get it's value:

printf("need notify ? %d\n", EVBASE_NEED_NOTIFY(event_base));

Otherwise, let's check the values with GDB:

(gdb) p _evthread_id_fn
$5 = (unsigned long (*)(void)) 0x7ffff7991d50 <evthread_posix_get_id>

(gdb) p base->running_loop
$6 = 1

Looks good. We can continue that investigation:

(gdb) b evthread_notify_base
(gdb) continue
Breakpoint 2, evthread_notify_base (base=0x617c50) at event.c:2039
2039    {

Where are we? at the begiining of this function:

(gdb) l
2038    evthread_notify_base(struct event_base *base)
2039    {
2040        EVENT_BASE_ASSERT_LOCKED(base);
2041        if (!base->th_notify_fn) {
2042            return -1;
2043        }
2044        if (base->is_notify_pending)
2045            return 0;
2046        
2047        base->is_notify_pending = 1;
2048        return base->th_notify_fn(base);
2049    }

Great, so, do we have a notification function?

(gdb) p base->th_notify_fn
$7 = (int (*)(struct event_base *)) 0x0

No, there is no notification function!

Now it makes sense:

Intermediate conclusion regarding the visible error:

the dispatcher thread is blocked forever:
- and waits for an epoll notification that never comes,
- then it should check a flag for breaking its listening loop
the main thread sets the loop-breaking flag,
- but fails to notify epoll, so the dispatcher thread never checks the flag
- then it waits forever for the dispatcher thread completion.

Problem solved! Or maybe not, maybe we just understood the situation.

Understanding the Reason of the Invalid State: Backwards Investigation

Now that we are at the bottom of the callstacks, we need to go backwards in time: The state of the application is invalid, the notify function pointer has to be set, so why is it empty? which decision(s) set it to NULL instead of the actual notification function?

New lead: Why is this th_notify_fn empty?

$ cd $PATH_TO_LIBEVENT_SRC
grep th_notify_fn * -n
grep: autom4te.cache: Is a directory
grep: compat: Is a directory
event.c:2037:    if (!base->th_notify_fn) {
event.c:2045:    return base->th_notify_fn(base);
event.c:2829:    base->th_notify_fn = notify;
event-internal.h:293:    int (*th_notify_fn)(struct event_base *base);

The instruction at event.c:2829 looks interesting. It is inside function evthread_make_base_notifiable. Is this function ever called?

(gdb) start
(gdb) break evthread_make_base_notifiable
Breakpoint 2 at 0x7ffff7ba6220: file event.c, line 2780.

(gdb) cont
PID is 28680
Setting thread usage a bit late
[New Thread 0x7ffff6b39700 (LWP 28778)]
Running the dispatcher ...
Sleeping for 2 seconds ...
And killing the application.
Waiting for the thread to die ... (try ^z/fg, SIGSTOP/SIGINT)
^C

Apparently, no. So th_notify_fn is empty because it is never set.

New lead: How to 'make the base notifiable'?

$ grep evthread_make_base_notifiable *.c -n
event.c:640:        r = evthread_make_base_notifiable(base);
event.c:884:        res = evthread_make_base_notifiable(base);
event.c:2775:evthread_make_base_notifiable(struct event_base *base)

Line 2775 is the function definition, line 884 is inside a function called event_reinit, so that's certainly not what we can. Line 640 looks more interesting:

struct event_base *
event_base_new_with_config(const struct event_config *cfg)
{
    ...
#ifndef _EVENT_DISABLE_THREAD_SUPPORT
    if (EVTHREAD_LOCKING_ENABLED() &&
        (!cfg || !(cfg->flags & EVENT_BASE_FLAG_NOLOCK))) {
            int r;
            EVTHREAD_ALLOC_LOCK(base->th_base_lock,
            EVTHREAD_LOCKTYPE_RECURSIVE);
            base->defer_queue.lock = base->th_base_lock;
            EVTHREAD_ALLOC_COND(base->current_event_cond);
            r = evthread_make_base_notifiable(base);
            if (r<0) {
                event_warnx("%s: Unable to make base notifiable.", __func__);
                event_base_free(base);
                return NULL;
            }
        }
#endif

Macro _EVENT_DISABLE_THREAD_SUPPORT is not defined (so the block is executed), but what is EVTHREAD_LOCKING_ENABLED()?

grep EVTHREAD_LOCKING_ENABLED * --after-context=1 --include='*.h'
evthread-internal.h:#define EVTHREAD_LOCKING_ENABLED()        \
evthread-internal.h-    (_evthread_lock_fns.lock != NULL)

... and this thread-locking function table is ... empty!

(gdb) p _evthread_lock_fns.lock
$1 = (int (*)(unsigned int, void *)) 0x0

(gdb) p _evthread_lock_fns
$2 = {
    lock_api_version = 0, 
    supported_locktypes = 0, 
    alloc = 0x0, 
    free = 0x0, 
    lock = 0x0, 
    unlock = 0x0
}

So to make the base notifiable ...

New lead: Thread locking should be enabled.

Let's check if it already done ..., and let's do it with GDB, instead of grep, with a memory watchpoint:

(gdb) start
(gdb) p _evthread_lock_fns 
$1 = { lock_api_version = 0,  supported_locktypes = 0,  ... }
# that's the defaut value

(gdb) watch _evthread_lock_fns
Watchpoint 2: _evthread_lock_fns

(gdb) run
... takes forever ...

Wait, there is a problem here, it takes forever. And it doesn't say 'Hardware watchpoint', although I know my CPU is capable of it ...

Side remark: That's because _evthread_lock_fns is a structure, and HW watchpoints can only watch a few addresses. So software watchpoints are used instead, which are very very very slow (each time a machine instruction is executed, GDB analyses the memory address(es) involved, and continues the execution if they are not covered by a watchpoint).

Let's try again:

(gdb) start
(gdb) p &_evthread_lock_fns
$1 = (struct evthread_lock_callbacks *) 0x7ffff7dda320 <_evthread_lock_fns>

(gdb) watch *0x7ffff7dda320
Hardware watchpoint 2: *0x7ffff7dda320
# good, a HW breakpoint
# it only watches the first bits of the structure, but that's enough

# OR that would do the same:
(gdb) watch _evthread_lock_fns->lock_api_version
Hardware watchpoint 3: _evthread_lock_fns->lock_api_version

(gdb) cont
Hardware watchpoint 2: *0x7ffff7dda320
Old value = 0
New value = 1
Hardware watchpoint 3: _evthread_lock_fns->lock_api_version

Old value = 0
New value = 1
0x00007ffff7ba7472 in evthread_set_lock_callbacks (cbs=cbs@entry=0x7fffffffe4b0) at evthread.c:101
101            memcpy(target, cbs, sizeof(_evthread_lock_fns));

(gdb) p *cbs
$3 = {
    lock_api_version = 1, 
    supported_locktypes = 1, 
    alloc = 0x7ffff7991e60 <evthread_posix_lock_alloc>, 
    free = 0x7ffff7991e40 <evthread_posix_lock_free>, 
    lock = 0x7ffff7991e20 <evthread_posix_lock>, 
    unlock = 0x7ffff7991e10 <evthread_posix_unlock>
}

So we do set the thread locking functions ...

New lead: but why were they empty earlier on? when/where do we set them?

(gdb) where
#0  0x00007ffff7ba7390 in evthread_set_lock_callbacks (cbs=cbs@entry=0x7fffffffe4b0) at evthread.c:76
#1  0x00007ffff799202b in evthread_use_pthreads () at evthread_pthread.c:185
#2  0x000000000040162a in main() () at bug.cpp:69

oh, that's from the main, at bug.cpp:69. Where were we when we saw them empty?

(gdb) break event_base_new_with_config
(gdb) run
(gdb) where
#0  0x00007ffff7ba63c0 in event_base_new_with_config (cfg=cfg@entry=0x617c20) at event.c:553
#1  0x00007ffff7ba680b in event_base_new () at event.c:452
#2  0x0000000000401553 in main() () at bug.cpp:50

Oh, I see, we were at bug.cpp:50, so 19 lines before.

Defect explanation and fix: it happens that the application created and configured libevent's base (event_base_new()) before configuring its thread-support. So the notification could not be sent, and hence the dispatcher thread remained blocked in epoll forever.

This is obviously an 'application bug', libevent was used incorrectly. Furthermore, event_base_loopbreak did returned at error code, but that was not checked and hence never noticed. We can verify that very quickly with GDB:

(gdb) start
(gdb) break bug.cpp:79
(gdb) cont
Breakpoint 2, main () at bug.cpp:79
79      event_base_loopbreak(base);

(gdb) p event_base_loopbreak(base)
$1 = -1

This way, we ask GDB to call the function event_base_loopbreak and print its result. Another way is to set a breakpoint inside the function, and ask GDB to finish it:

(gdb) break event_base_loopbreak
(gdb) cont
Breakpoint 2, event_base_loopbreak (event_base=0x617c50) at event.c:1511
1511    {

(gdb) finish
Run till exit from #0  event_base_loopbreak (event_base=0x617c50) at event.c:1511
main () at bug.cpp:82
82      std::this_thread::sleep_for(std::chrono::milliseconds(1000));
Value returned is $1 = -1

Both ways show that the return value is -1...

Of course the application from which I base this example was in development, with no bug-free guarantee, so no one to blame! I actually quite enjoyed this work of investigation :)

You can change #define DO_BUG to false at the beginning the source code, and see that is solves the problem.

Conclusion

In this article I tried to highlight the usage of GDB with a real debugging example, on which I spend my last Friday afternoon. I showed the way I performed this investigation, although I must admit that I used a bit more printf than I show here. That's mainly because of the probe effect that occurs with this bug, that complicates the use of GDB.

If ever you don't like interactive debuggers, or you would have done this debugging with different tools and techniques, let me know, I'm curious! Just keep in mind that I didn't write the original code and I didn't want to spend my afternoon reading libevent documentation! (I didn't even know for sure that the bug was related to this library).

Also, this is not an 'advanced' usage of GDB, I only used the very basic functionalities. Maybe one day I'll encounter a trickier bug that requires a more subtle juggling with GDB commands, and I'll post a new article on the topic!

Probe effect: but it works with GDB!

Saturday, September 05, 2015 - No comments

Probe effect is unintended alteration in system behavior caused by measuring that system. (according to Wikipedia)

I had such a problem yesterday while debugging a tool based on libevent :

in a multi-threaded application, the main thread receives the quit order,
it transmits the information to the helper thread,
- the helper thread quits
the main thread wait for the end of the secondary thread execution
and terminates the process.

But that didn't work, because the helper thread never quits, and the main one waits forever.

MWE: Minimal Working Example

Here is a slightly simplified reconstruction of this bug. Don't pay attention to the epoll configuration, I don't know how to use it and didn't learn!

#include <sys/epoll.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define MAXEVENTS 64
#define TIMEOUT -1

void main () {
  struct epoll_event event; struct epoll_event *events;
  int fd = 0; int efd = epoll_create1 (0);

  printf("OH, HAI!\n");
  printf("BTW pid = %d\n", getpid());

  /* I never learnt how to use epoll...  */
  event.events = EPOLLIN | EPOLLET; event.data.fd = fd;
  epoll_ctl (efd, EPOLL_CTL_ADD, fd, &event);
  events = calloc (MAXEVENTS, sizeof event);

  /* Wait forever for an event.  */
  epoll_wait (efd, events, MAXEVENTS, TIMEOUT);

  printf("KTHXBYE\n");
}

then run it:

$ ./a.out
OH, HAI!
BTW pid = 4900

yes, it's blocked, that was expected. Ctrl-C to quit.

The probe effect

Now run it again, and attach GDB to it:

$ gdb --pid $THE_PID
GNU gdb (GDB) 7.9.50.20150505-cvs
...
0x00007fdb33ac7703 in __epoll_wait_nocancel () from /usr/lib/libc.so.6
(gdb) continue
KTHXBYE
[Inferior 1 (process 6957) exited with code 010]

What !? the application now prints KTHXBYE and exits normally ... ! That's the probe effect !

The probe effect ... may not be directly related to the probe!

There is another way to get get the application finish properly, it's with a SIGSTOP/SIGCONT combo:

$ ./a.out
OH, HAI!
BTW pid = 7997
^Z
[1]  + 7997 suspended  ./a.out
$ fg
[1]  + 7997 continued  ./a.out
KTHXBYE

So that probe effect is not actually due to the probe, but to ...

Explanation of this probe effect

... to the fact that GDB (and signals) interrupt system calls !

That's (obviously) in epoll_wait man page:

ERRORS

EINTR : The call was interrupted by a signal handler before either (1) any of the requested events occurred or (2) the timeout expired; see signal(7).

In my libevent application, the thread was blocked in this epoll_wait syscall, which was interrupted by the debugger/signals. Then it checks for the exit flag and terminates as expected.

(It was a configuration problem, not a bug in libevent. The function that should notify the epoll listener was not correctly configured.)

TL;DR

Debuggers (and signals) may and will interrupt system calls. Your code should take that in consideration, that's a normal behavior of system calls. But that may also introduce "probe effects" when you try to study your code with a debugger!

Bug with multiple threads running inside GDB

Friday, September 04, 2015 - No comments

When extending through its Python API (or directly in C), the situation where you have to use threads may pop up. For instance with GUIs, or when using another library or module.

Unfortunately, GDB doesn't like that much. First of all, you cannot call GDB Python functions from another thread. GDB itself not multithreaded, and hence not thread safe. Python is though, so you should be able to block the main thread in Python, and call GDB functions in the other thread, but outside from that, GDB will simply crash! And there is nothing to do against that, as far as I know.

But worth (kind of), GDB doesn't support that your code spawns a thread.

TL;DR: solution

In C (GDB bug #17247, patch and discussion):

sigemptyset (&sigchld_mask);
sigaddset (&sigchld_mask, SIGCHLD);
sigprocmask (SIG_BLOCK, &sigchld_mask, &prev_mask);

scm_with_guile (call_initialize_gdb_module, NULL);
sigprocmask (SIG_SETMASK, &prev_mask, NULL);

In Python:

import pysigset, signal

with pysigset.suspended_signals(signal.SIGCHLD):
    # start threads, they will inherit the signal mask
    pass

Description of the bug

When you create (in Python) a thread, and then run the application (in my case it happens mainly when the application itself spawns threads), GDB freezes with the following callstack:

(gdb) where
#0 sigsuspend () from /usr/lib/libc.so.6
#1 wait_lwp (lp=lp@entry=0x21f63b0) at ../../gdb/gdb/linux-na
#2 stop_wait_callback (lp=0x21f63b0, data=<optimized out>) at
#3 iterate_over_lwps (filter=..., callback=callback@entry=0x4
#4 linux_nat_wait_1 (ops=<optimized out>, target_options=1, o
#5 linux_nat_wait (ops=<optimized out>, ptid=..., ourstatus=0
#6 thread_db_wait (ops=<optimized out>, ptid=..., ourstatus=0
...

Explanation of the bug

The function in which GDB is blocked is sigsuspend:

NAME

sigsuspend, rt_sigsuspend - wait for a signal

SYNOPSIS

int sigsuspend(const sigset_t * mask);

DESCRIPTION

sigsuspend() temporarily replaces the signal mask of the calling process with the mask given by mask and then suspends the process until delivery of a signal whose action is to invoke a signal handler or to terminate a process.

GDB uses this function to wait for new events from the application execution: when something occurs in the debuggee (see how debuggers work), the kernel will inform GDB of it by sending a SIGCHLD signal. When it's received, GDB awakes and check what happened.

However, the signal is delivered to GDB process, but not necessarily to its main thread. And it practise, it occurs often that it's delivered to the second thread, who doesn't care about it (that's the default behavior), and continues its life as if nothing occurred.

Solution of the problem

We cannot change the behavior of the thread. However, we have a bit of control over its default signal handling behavior: it is inherited from its parent! So, in Python, we can go this way:

import pysigset, signal

# with SIGCHLD blocked,
with pysigset.suspended_signals(signal.SIGCHLD):
    # start threads,
    # they will inherit the signal mask
    pass
# SIGCHLD is unblocked after the with statement,
# so that GDB can operate properly afterwards

GDB and C Preprocessor Macro

Sunday, December 21, 2014 - 3 comments

C preprocessor macro are quite convenient while programming, however they can quickly become a burden while compiling and debugging. Consider this little C program:

#define str(s) #s
#define xstr(s) str(s)

#define A 10
#define B 15

#ifndef C
#define C 15
#endif

#define MIN(x, y) (x > y ? y : x)
int printf (const char * format, ... );

void main(int argc, char **argv) {
  printf("test1 " xstr(A) " vs " xstr(B) " -> %d\n", MIN(A, B));
#undef A
#undef B
#define A 0
    printf("test2 " xstr(A) " vs " xstr(C) " -> %d\n", MIN(A, C));
}

It outputs, as expected:

$ gcc test.c && ./a.out
test1 10 vs 15 -> 10
test2 0 vs 15 -> 0
$ gcc test.c -DC=-5 && ./a.out
test1 10 vs 15 -> 10
test2 0 vs -5 -> -5

Let's add a -g in the compiler flags and see how we can follow its execution with GDB.

Debugging macros with GDB, the hard way

$ gcc test.c -DC=-5 -g && gdb ./a.out
GNU gdb (GDB) 7.8.1
Reading symbols from ./a.out...done.
(gdb) start
Starting program: /tmp/a.out
Temporary breakpoint 1, main (argc=1, argv=0x7fffffffe998) at test.c:15
15    printf("test1 " xstr(A) " vs " xstr(B) " -> %d\n", MIN(A, B));
(gdb) print A
No symbol "A" in current context.
(gdb) p B
No symbol "B" in current context.

Okay, so we can't go anywhere so easily. So let's continue the hard way:

(gdb) x/4i $pc # print 4 i-instructions after the current address of the program counter/instruction pointer $pc
=> 0x400515 <main+15>:  mov    $0xa,%esi
   0x40051a <main+20>:  mov    $0x4005c4,%edi
   0x40051f <main+25>:  mov    $0x0,%eax
   0x400524 <main+30>:  callq  0x4003e0 <printf@plt>
(gdb) p/d 0xa # print hexadecimal number 0xa in d-igits
$2 = 10
(gdb) p (char *) 0x4005c4 # cast this address in char* and print it
$3 = 0x4005c4 "test1 10 vs 15 -> %d\n"

There we are, the processor is going to execute printf("test1 10 vs 15 -> %d\n", 10). Naively I expected to see the comparison between 10 and 15, but the compiler optimizes that automatically, even with -O0 flag. Change one operand to a 'real' C variable to see the comparison in the code.

That works, but that's a bit hardcore, and not easy to apply in all the situation. Nonetheless, it can be useful for instance to see which functions(s) will be called by a preprocessor macro function.

What can we do to better understand such pieces of code ?

First, let's remember how the preprocessor works: it preprocesses the source file, before passing it to the actual C compiler.

So let's intercept this step.

Debugging macros with GCC, an easier way

GCC can dump this intermediate step with the flag -E, or directly apply cpp to your file:

$ cpp test.c -DC=-5 # OR # gcc -E test.c -DC=-5
# 1 "test.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "test.c"
# 12 "test.c"
int printf (const char * format, ... );

void main(int argc, char **argv) {
  printf("test1 " "10" " vs " "15" " -> %d\n", (10 > 15 ? 15 : 10));
  printf("test2 " "0" " vs " "-5" " -> %d\n", (0 > -5 ? -5 : 0));
}

And there we can see what is exactly fed to the C compiler. Note that the comparison is still here, so it's (obviously) not the preprocessor which optimized it out.

But this is not very interactive ... and we said that we want to use GDB!

Debugging macros with GDB, the nice way

The lasts versions of GCC+GDB (I can't say since when it's in place, just that it works with my up-to-date Archlinux) can respectively include and interpret preprocessor macro definitions inside the binary's debugging information. It's not included in the standard -g debugging flag, but in -g3, certainly because of the quantity of information that need to be included with dozens of header files are included in a C file.

$ gcc -g3 test.c -DC=-5  && gdb a.out 
GNU gdb (GDB) 7.8.1
Reading symbols from a.out...done.
(gdb) info macro A
The symbol `A' has no definition as a C/C++ preprocessor macro
at <user-defined>:-1

Argl, what's that? all the article was a lie? all of that to come to 'A' has no definition as a C/C++ preprocessor macro? That's what I though until yesterday. I tried to play with macros in GDB a couple of time, but always failed this hard. Until I read this forum post, that says in a more polite way: RTFM!

gdb uses the current listing position to decide which macro definitions are in scope.

Oh, yes, indeed, it makes sense ... !

(gdb) start
Temporary breakpoint 1 at 0x400515: file test.c, line 15.
Starting program: /tmp/a.out 
Temporary breakpoint 1, main (argc=1, argv=0x7fffffffe998) at test.c:15
15    printf("test1 " xstr(A) " vs " xstr(B) " -> %d\n", MIN(A, B));
(gdb) info macro A
Defined at /tmp/test.c:4
#define A 10
(gdb) next
test1 10 vs 15 -> 10
19      printf("test2 " xstr(A) " vs " xstr(C) " -> %d\n", MIN(A, C));
(gdb) info macro A
Defined at /tmp/test.c:18
#define A 0
(gdb) info macro C
Defined at /tmp/test.c:0
-DC=-5
(gdb) info macro MIN
Defined at /tmp/test.c:11
#define MIN(x, y) (x > y ? y : x)

GDB can also expand macro functions:

(gdb) macro expand MIN(A, B)
expands to: (10 > 15 ? 15 : 10)
(gdb) macro expand MIN(A,  MIN(A, 15))
expands to: (10 > (10 > 15 ? 15 : 10) ? (10 > 15 ? 15 : 10) : 10)
(gdb) p (10 > (10 > 15 ? 15 : 10) ? (10 > 15 ? 15 : 10) : 10)
$1 = 10

There we are, and that's working well, cool :-) (if you run gcc -E test.c -DC=-5 -g3 you'll see that the preprocessor passes the #defines to the compiler, including the builtin ones.

However, I couldn't get macro define|list|undef to work. According to the help, GDB should be able to change the value of macro-definitions, but that seems very hard to implement, maybe that's why it doesn't work ...

(gdb) help macro define
Define a new C/C++ preprocessor macro.
The GDB command `macro define DEFINITION' is equivalent to placing a
preprocessor directive of the form `#define DEFINITION' such that the
definition is visible in all the inferior's source files.
For example:
  (gdb) macro define PI (3.1415926)
  (gdb) macro define MIN(x,y) ((x) < (y) ? (x) : (y))
(gdb) start
Temporary breakpoint 1 at 0x400515: file test.c, line 15.
Starting program: /tmp/a.out 
Temporary breakpoint 1, main (argc=1, argv=0x7fffffffe998) at test.c:15
15    printf("test1 " xstr(A) " vs " xstr(B) " -> %d\n", MIN(A, B));
(gdb) macro define A -5
(gdb) macro define B -10
(gdb) n
test1 10 vs 15 -> 10

[Dev-tools configuration] GDB's .gdbinit

Thursday, December 04, 2014 - 1 comment

In the next posts, I'll share the configuration files of my development and debugging tool. I think having a nice dev environment is key to good quality programming. At least, it makes your experience better. It's also verrrry nice for the people that will have to interact with you and your code!

I'll start with my .gdbinit:

No window size, we can scroll:
```
set height 0
set width 0
```
Allow pending breakpoints, we know what we're doing:

(gdb) break foobar Function "foobar" not defined. Breakpoint 9 (foobar) pending.

means that there is no "foobar" currently loaded, but maybe you know it will appear later.
```
set breakpoint pending on
```
We're doing python development, so print the full stack trace when it crashes:
```
set python print-stack full
```
GDB, please don't complain, I know what I'm doing! ... but beware, you may loose your session if you enter start in the middle of your debugging
```
set confirm off
```
Structure easier to read:
```
set print pretty
```

Save history, up-arrow and ctrl-r can save a lot of time:

set history filename ~/.gdb_history
set history save

We can also add a little bit of Python shortcut commands:

import gdb

quickly print Python objects pp instead of py print:

class pp(gdb.Command):
    """Python print its arg"""
    def __init__(self):
        gdb.Command.__init__ (self, "pp", gdb.COMMAND_DATA,
                              completer_class=gdb.COMPLETE_SYMBOL)
    def invoke (self, arg, from_tty):
        gdb.execute("python print %s" % arg)
pp()

quickly list attributes of a Python object: ppd

class ppd(gdb.Command):
    """Python print dir() of its arg"""
    def __init__(self):
        gdb.Command.__init__ (self, "ppd", gdb.COMMAND_DATA, completer_class=gdb.COMPLETE_SYMBOL)

    def invoke (self, arg, from_tty):
        gdb.execute("python print dir(%s)" % arg)
ppd()

And finally a code I retrieved from ratmice@gitorious, an extended prompt for GDB, which shows the current source file, line and function, all that with colors (my version is here):

(src/ls.c:1242 main gdb)
$

Older entries »