(gdb) break *0x972

Debugging, GNU± Linux and WebHosting and ... and ...

GDB, please set a breakpoint on all my functions

Friday, February 10, 2017 - No comments

For studying a piece of code, a colleague of mine wanted to set a breakpoint on all the functions of its applications. All but those in shared libraries, so not the libc, libpthread, etc. We agreed that this means all the functions defined in files compiled with -g.

TL;DR;

==> breakpoint_all.py <==

and from GDB:

(gdb) source breakpoint_all.py

I don't claim it's the best way to do, nor the most efficient tracing, I just highlight a possibility of gdb+python. Feel free to update the last step (or the others) to let it do what ever you need!


In the command line, GDB can list the functions with info functions:

(gdb) info functions
File ../../gdb/gdb/compile/compile.c:
    void _initialize_compile(void);
    int compile_register_name_demangle(struct gdbarch *, const char *);
    char *compile_register_name_mangled(struct gdbarch *, int);
    void eval_compile_command(struct command_line *, char *, enum compile_i_scope_types);
    static int check_raw_argument(char **);

File ../../gdb/gdb/complaints.c:
    void _initialize_complaints(void);
    void clear_complaints(struct complaints **, int, int);
    void complaint(struct complaints **, const char *, ...);
    void internal_complaint(struct complaints **, const char *, int, const char *, ...);

But at the moment (gdb 7.12.1) there is no equivalent in Python, so I had to find another way. I didn't want to parse this command, and create a breakpoint out of these function names, I wanted GDB Python to give me this list, so I had to find another way ...

List source files

Again there is a CLI command (info sources), again there is no Python wrapper:

(gdb) info sources
Source files for which symbols have been read in:

gdb/amd64-tdep.c, gdb/features/i386/amd64.c, gdb/features/i386/amd64-avx.c, gdb/gdbarch.h, gdb/target.h, ...

Source files for which symbols will be read in on demand:

gdb/gdb/features/i386/amd64.c, gdb/features/i386/x32.c, ...

I don't why there are two sets, but it's these files we are interested in ... so let's parse this output:

def get_file_addresses():
    sources = gdb.execute("info sources", to_string=True).split("\n")
    assert "Source files for which symbols have been read in" in sources[0]

    for line in sources:
        if line.startswith("Source files ") or not line.strip():
            continue
        for source in line.split(", "):
            # do something with `source`
            ...

Find source file memory addresses

Now we need to list the symbols in this files. Sounds easy, but in practice there is no function to lookup a symbol file from its name, and even with the symbol files (tables), there is no way to list the symbol it contains. So this is a dead end!

After a careful reading of the documentation, one finds out that only blocks contain a list of symbols, when you iterate over it. This is not written in bold, so easy to miss :-(. Anyway, we need to get the block corresponding to our source file scope. We could lookup it up from its PC: gdb.block_for_pc(pc), but at the moment we don't know a PC corresponding to this file.

One way to get this PC is by setting a breakpoint in this file: break <source file>:1 should work, and we'll be able to use the address of the breakpoint as a block lookup PC.

(gdb) b gdb/amd64-tdep.c:1
Breakpoint 1 at 0x461300: file ../../gdb/gdb/amd64-tdep.c, line 1.

But again, there is no way to do this in Python :-( So let's parse again the CLI output!

Pay attention to error though, if GDB could not convert the location into an address it will through a gdb.error exception, or if set breakpoint pending is on (in my system for instance), it will not fail, but indicate that the breakpoint is pending. Setting multiple breakpoint at the same address may also change the output.

def get_file_addresses():
        ....
        # do something with `source`
        try:
            # fails if source is a header file
            bpt_msg = gdb.execute("break {}:1".format(source), to_string=True)
        except gdb.error as e:
            # if show breakpoint pending ==> off
            continue
        bp_id = bpt_msg.partition("Breakpoint ")[-1].partition(" ")[0]
        gdb.execute("delete {}".format(bp_id), to_string=True)

        if "pending" in bpt_msg or "No line" in bpt_msg:
            # if show breakpoint pending ==> on
            """No line 1 in file "/usr/include/bits/pthreadtypes.h".
            Breakpoint 8 (/usr/include/bits/pthreadtypes.h:1) pending."""
            continue

        """Note: breakpoint 3 also set at pc 0x461300.
        Breakpoint 2 at 0x461300: file ../../gdb/gdb/amd64-tdep.c, line 1."""

        bp_line = [a for a in bpt_msg.split("\n") if a.startswith("Breakpoint ")][0]
        file_1st_addr = int(bp_line.split(" ")[3][:-1], 16) # change to long in Py2 IIRC

        yield source, file_1st_addr

List all the symbols/functions from a file

Now we've got the address on a file, we can get its global scope (Block.global_block) and list the symbols it contains. In this list, we just trim out what isn't a function:

def get_all_functions_from_pc(pc):
    block = gdb.block_for_pc(pc)
    for symb in block.global_block:
        if not symb.is_function: continue

        yield symb

And we're almost there, just need to assemble the different pieces:

for source, pc in get_file_addresses():
        print("{} ==> {}".format(source, hex(pc)))
        for fct_symb in get_all_functions_from_pc(pc):
             # do something with fct_symb
            ...

Set a (trace) breakpoint on every functions

Last step, what do we want to do with this symbol: here, just some simple tracing each time the function is called:

class TraceBreakpoint(gdb.Breakpoint):
    def __init__(self, symb):
        addr = int(symb.value().address)

        gdb.Breakpoint.__init__(self, "*{}".format(hex(addr)), internal=True)
        self.silent = True

    def stop(self):
        caller = gdb.newest_frame().older()
        caller_name = caller.name() if caller else 'none'

        print('{};{};{}'.format(gdb.selected_thread().num, caller_name, gdb.newest_frame().name()))

        return False # we never want to stop

which gives this last function:

def set_trace_bpt_on_all_symbols():
    for source, pc in get_file_addresses():
        print("{} ==> {}".format(source, hex(pc)))
        for fct_symb in get_all_functions_from_pc(pc):
            bpt = TraceBreakpoint(fct_symb)
            print("\t{} (Bpt #{})".format(fct_symb, bpt.number))

Load it

==> breakpoint_all.py <==

and from GDB:

(gdb) source breakpoint_all.py

Multithreaded warning

We noticed that in multithreaded environments, if one thread calls a breakpointed function very often, the other won't have time to progress. Consider running in async mode to stop only the thread that hit the breakpoint:

(gdb) set non-stop on

GDB scheduler locking, function calls and multi-threading

Tuesday, February 07, 2017 - 2 comments

GDB can call functions in the program's address-space. That can be useful in different sitations, for debugging or just informal code testing. Let's say I want to allocate some memory:

(gdb) p/x (char *)malloc(1024)
$3 = 0x60c960

I can do what ever I want with this buffer:

(gdb) set $b = $1
(gdb) set $b[0] = 48
(gdb) set $b[1] = 120
(gdb) set $b[2] = 57
(gdb) set $b[3] = 55
(gdb) set $b[4] = 50
(gdb) set $b[5] = 0
(gdb) p $b
$3 = 0x60c960 "0x972"
(gdb) call puts($b)
0x972

... but that's not the point of this post. What I wanted to highlight here is want happens in a multithreaded execution, if I do something similar:

(gdb) break 350
(gdb) run
[... many threads are created here ...]
[Switching to Thread ... (LWP 20645)]
Breakpoint 1, .... at ...:350
(gdb) p malloc(1024)
[Switching to Thread ... (LWP 20592)]
Breakpoint 1, .... at ...:350
The program stopped in another thread while making a function call from GDB.
Evaluation of the expression containing the function
(malloc) will be abandoned.
When the function is done executing, GDB will silently stop.

The problem here is that another thread hits the breakpoint during the execution of malloc. This is the default behavior of GDB, and it may be surprising when you're not used to it ...

But in addition to being suprising, it is problematic because the return value of malloc is lost! And the same can happend with next, step, finish, etc.

So if you want to force a single-thread behavior (that is, everytime you're sure that there should't be any deadlock during the function call), set GDB's scheduler-locking.

With the scheduler-locking enabled, GDB only let the current thread run. The execution of the other one is blocked in Linux scheduler.

(In practise, this is implemented in ptrace API, to which GDB either passes the process PID or only the thread TID.)

(gdb) set scheduler-locking on
(gdb) p malloc(1024)
...
(gdb) set scheduler-locking off

Don't forget to turn it back off again afterwards, or your thread will certainly block soon because of someone else's mutex!

In Python, I have a simple wrapper that does that for me:

def set_parameter(name, value):
    class temp:
        def __init__(self):
            self.old = None

        def __enter__(self):
            self.old = gdb.parameter(name)

            if self.old is True: self.old = "on"
            if self.old is False: self.old = "off"
            # otherwise don't change

            gdb.execute("set {} {}".format(name, value), to_string=True)

        def __exit__(self, type, value, traceback):
            gdb.execute("set {} {}".format(name, self.old), to_string=True)
    return temp()

I use it this way:

with set_parameter("scheduler-locking", True):
  buffer = gdb.parse_and_eval("malloc")(1024)

Break if outside of section

Tuesday, April 26, 2016 - No comments

Just a little example of how to use gdb.py to create a breakpoint that only fires if called outside of a give section :

int main() {
  srand(time(NULL));

  for (int i = 0; i < 100; i++) {
    if (rand() > THRESHOLD) do_in_section(0);

    start();
    do_in_section(1);
    stop();
  }
}

Functions start() and stop() delimit the section: it is okay to call do_in_section() inside the section. However, every once in a while, do_in_section() is called from outside of this section. (Forget about that boolean flag parameter, it's just for logging and consistency checks. Real-life codes won't have such an easy way to discriminate outside from inside call ... otherwise you don't need an advanced breakpoint!)

(gdb) break do_in_section

will lead to many useless stops, as most of the call to do_in_section() are legal, so what should do is:

  • set an internal breakpoint on start(), where we set a flag and continue
  • set an internal breakpoint on stop()` where we unset the flag and continue
  • set a breakpoint on do_in_section() that checks that flag, and only stop if outside of the section

which gives in Python:

import gdb

in_section = False

class StartStopBreakpoint(gdb.Breakpoint):
    def __init__(self, loc, is_start):
        gdb.Breakpoint.__init__(self, loc, internal=True)
        self.silent = True

        self.is_start = is_start

    def stop(self):
        global in_section

        if self.is_start: # I know that this 'if' is not necessary ...
            assert not in_section
            in_section = True
        else:
            assert in_section
            in_section = False

        return False # never stop here

class SectionBreakpoint(gdb.Breakpoint):
    def __init__(self, location):
        gdb.Breakpoint.__init__(self, location, internal=True)
        self.silent = True

    def stop(self):
        if in_section:
            print("ignore hit in section")
            assert gdb.parse_and_eval("inside") == 1
            return False

        else:
            print("")
            print("Section breakpoint hit outside of section")
            print("")
            assert gdb.parse_and_eval("inside") == 0
            return True

StartStopBreakpoint("start", is_start=True)
StartStopBreakpoint("stop", is_start=False)
SectionBreakpoint("do_in_section")

Just source that file, or input it in GDB python commandline. The three last line instantiate the breakpoints:

$ gdb -ex "source test.py" a.out
GNU gdb (GDB) 7.10.50.20160122-git
...
(gdb) run
Starting program: /home/kevin/a.out 
ignore
ignore ....

Section breakpoint hit outside of section

(gdb) print inside
$1 = 0

Et voila!

Riddle with Python 2/3 and GDB.py checkpoint-restart

Wednesday, March 09, 2016 - No comments

At the moment, I'm playing with thread checkpoint-restart, implemented somehow like setjmp/longjmp :

REGS = "rax","rbx","rcx","rdx","rsi","rdi","rbp","rsp","r8","r9", ...

def checkpoint(): 
  reg_values = map(gdb.newest_frame().read_register, REGS))
  return reg_values

def restart(reg_values):
  for reg, val in zip(REGS, reg_values):
    gdb.execute("set ${} = {}".format(reg, val), to_string=True)

This a naive implementation, but with a few more tricks, it does work (what I have to do is save/restore rip and rbp for the frame above, I'm not sure why).

However this code works for Python 2, but not for Python 3. Can you find why?

I'll let you guess, the answer is written below in white:

The map function in Python 3 returns a generator, which is lazily evaluated. Thus in Python 3, the registers are actually read ... in the restart function! So, of course, they don't hold the values of the check-point, and hence the restart function does nothing!

Tricks for Python debugger pdb

Tuesday, February 02, 2016 - No comments

Using pdb in PyGTK applications

If you try to run pdb.set_trace() on a PyGTK application, the command-line prompt will be unusable, certainly because of (native) multithreading. Solution is simple once you got it, but not necessarily easy to find on the Internet:

def import_pdb_set_trace():
  '''Set a tracepoint in PDB that works with Qt'''
  from PyQt5.QtCore import pyqtRemoveInputHook
  pyqtRemoveInputHook()

  import pdb; pdb.set_trace() 

PDB aliases [for gdb.py scripting]

# gdb.py
alias ge import gdb;gdb.execute("%*") #  gdb execute
alias gq import gdb;gdb.execute("quit") # gdb quit

# force quit
alias fq import os;os._exit(0) 

# quick print
alias pr print(%1)

# dir()
alias dir for k in dir(%1): print("{}".format(k))
alias dirv for k in dir(%1): print("{} --> {}".format(k, getattr(%1, k)))

alias ds dir self

Simple GDB Extensions with Python

Friday, January 08, 2016 - No comments

On Stackoverflow, I noticed that some people find missing features in GDB. There features are nothing complicated, just combination of existing commands that would be useful for them, but that are not part of the native set of commands.

GDB/Python interface is perfect to solve that kind of problem :-) It just requires simple Python skills and a little bit of documentation reading:

Does GDB have a step-to-next-call instruction?

SO question

Simple answer: no, step-to-next-call is not part of GDB commands.

GDB/Python-aware answer: no, it's not part of GDB commands, but it's easy to implement!

  • To stop before, you need to stepi/nexti (next assembly instruction) until you see call in the current instruction:

--

import gdb

class StepBeforeNextCall (gdb.Command):
    def __init__ (self):
        super (StepBeforeNextCall, self).__init__ ("step-before-next-call",
                                                   gdb.COMMAND_OBSCURE)

    def invoke (self, arg, from_tty):
        arch = gdb.selected_frame().architecture()

        while True:
            current_pc = addr2num(gdb.selected_frame().read_register("pc"))
            disa = arch.disassemble(current_pc)[0]
            if "call" in disa["asm"]: # or startswith ?
                break

            SILENT=True
            gdb.execute("stepi", to_string=SILENT)

        print("step-before-next-call: next instruction is a call.")
        print("{}: {}".format(hex(int(disa["addr"])), disa["asm"]))

def addr2num(addr):
    try:
        return int(addr)  # Python 3
    except:
        return long(addr) # Python 2

StepBeforeNextCall()
  • To stop after the call, you compute the current stack depth, then step until it's deeper:

--

import gdb

def callstack_depth():
    depth = 1
    frame = gdb.newest_frame()
    while frame is not None:
        frame = frame.older()
        depth += 1
    return depth

class StepToNextCall (gdb.Command):
    def __init__ (self):
        super (StepToNextCall, self).__init__ ("step-to-next-call", 
                                               gdb.COMMAND_OBSCURE)

    def invoke (self, arg, from_tty):
        start_depth = current_depth =callstack_depth()

        # step until we're one step deeper
        while current_depth == start_depth:
            SILENT=True
            gdb.execute("step", to_string=SILENT)
            current_depth = callstack_depth()

        # display information about the new frame
        gdb.execute("frame 0")

StepToNextCall() 

Relevant documentation is there:

Debug a source file using GDB without stepping into library calls?

SO question

You can see my answer to Does GDB have a “step-to-next-call” instruction? : there is no native GDB command for that (as far as I know, they may have worked on that), but it's easy to do in Python:

import gdb

class StepNoLibrary (gdb.Command):
    def __init__ (self):
        super (StepNoLibrary, self).__init__ ("step-no-library",
                                              gdb.COMMAND_OBSCURE)

    def invoke (self, arg, from_tty):
        step_msg = gdb.execute("step", to_string=True)

        fname = gdb.newest_frame().function().symtab.objfile.filename

        if fname.startswith("/usr"):
            # inside a library
            SILENT=False
            gdb.execute("finish", to_string=SILENT)
        else:
            # inside the application
            print(step_msg[:-1])

    StepNoLibrary()

It's easy to read what it does:it goes one step forward, and if the step ends up in a file stored in /usr/*, it finishes the function to come back to the application.

How to set skipping of uninteresting functions from gdbinit script?

SO question

Problem: if in .gdbinit you write skip uninteresting_function, gdb complains No function found named ... because the symbols files are not loaded yet.

Python solution: new command skip_pending

import gdb

to_skip = []

def try_pending_skips(evt=None):
    for skip in list(to_skip): # make a copy for safe remove
        try:
            # test if the function (aka symbol is defined)
            symb, _ = gdb.lookup_symbol(skip)
            if not symb:
                continue
        except gdb.error:
            # no frame ?
            continue
        # yes, we can skip it
        gdb.execute("skip {}".format(skip))
        to_skip.remove(skip)

    if not to_skip:
        # no more functions to skip
        try:
            gdb.events.new_objfile.disconnect(try_pending_skips) # event fired when the binary is loaded
        except ValueError:
            pass # was not connected

class cmd_pending_skip(gdb.Command):
    self = None

    def __init__ (self):
        gdb.Command.__init__(self, "pending_skip", gdb.COMMAND_OBSCURE)

    def invoke (self, args, from_tty):
        global to_skip

        if not args:
            if not to_skip:
                print("No pending skip.")
            else:
                print("Pending skips:")
                for skip in to_skip:
                    print("\t{}".format(skip))
            return

        new_skips = args.split()
        to_skip += new_skips

        for skip in new_skips:
            print("Pending skip for function '{}' registered.".format(skip))

        try:
            gdb.events.new_objfile.disconnect(try_pending_skips) 
        except ValueError: pass # was not connected

        # new_objfile event fired when the binary and libraries are loaded in memory
        gdb.events.new_objfile.connect(try_pending_skips)

        # try right away, just in case
        try_pending_skips()

cmd_pending_skip()

Save this code into a Python file pending_skip.py (or surrounded with python ... end in your .gdbinit), then:

source pending_skip.py
pending_skip fct1
pending_skip fct2 fct3
pending_skip # to list pending skips

The Python code will automatically check if the function can be skipped (i.e., if it is defined) whenever a symbol file is loaded. Running the command with no argument list the remaining pending skips.

Documentation references:

How can I use gdb to catch the moment when a function returns false?

SO question

import gdb
class FunctionFinishBreakpoint (gdb.FinishBreakpoint):
    def __init__ (self):
        gdb.FinishBreakpoint.__init__(self, gdb.newest_frame(), 
                                      internal=True)
        self.silent = True 

    def stop(self):
        #print("after: {}".format(int(self.return_value)))
        return not int(self.return_value)

class FunctionBreakpoint(gdb.Breakpoint):
    def __init__ (self, spec):
        gdb.Breakpoint.__init__(self, spec)
        self.silent = True

    def stop (self):
        #print("before")
        FunctionFinishBreakpoint() # set breakpoint on function return

        return False # do not stop at function entry

FunctionBreakpoint("test")

Save that in a finish.py file, edit it to your needs and source it from GDB, or run it between python ... end or in python-interactive (pi).

This code creates a FunctionBreakpoint, that triggers FunctionBreakpoint.stop eachtime function test is hit. The callback is silent, and only creates a FunctionFinishBreakpoint, that stops at the end of the current frame (ie, at the end of your function). That second stop calls FunctionFinishBreakpoint.stop, which tests if the return value evaluates to true or false. If it is "not true", it tells GDB to stop the execution.

Documentation references:

(gdb.FinishBreakpoint was added to GDB Python interface for that very purpose, by myself :-)

(last time I checked, there was an efficiency problem with these FinishBreakpoint, you may notice it if your function is called very often)

GDB and Frame-Filters: a bug and a quick fix

Tuesday, November 10, 2015 - No comments

With frame filters and decorators, GDB lets you rewrite the output of the where command. That's quite convenient, except that it doesn't work well in one situation (that I could not clearly understand ...):

(gdb) where no-filters # <--- notice the option here
#0  do_spin () at .../gcc-5.2.0/libgomp/config/linux/wait.h:55
#1  do_wait () at ...gcc-5.2.0//libgomp/config/linux/wait.h:64
#2  gomp_team_barrier_wait_end (...) at .../libgomp/config/linux/bar.c:112
#3  0x00007ffff7bd8966 in GOMP_barrier () at gomp_preload.c:49
#4  0x0000000000400a19 in main._omp_fn.0 () at parallel-demo.c:10
#5  0x00007ffff7bd89e4 in GOMP_parallel_trampoline (...) at gomp_preload.c:62
#6  0x00007ffff79c442e in gomp_thread_start () at .../libgomp/team.c:118
#7  0x00007ffff7bd8ce8 in pthread_create_trampoline () at pthread_preload.c:33
#8  0x00007ffff779f4a4 in start_thread () from /usr/lib/libpthread.so.0
#9  0x00007ffff74dd13d in clone () from /usr/lib/libc.so.6

becomes:

(gdb) where
#0  gomp_team_barrier_wait_end () at .../libgomp/config/linux/wait.h:55
#1  gomp_team_barrier_wait_end () at .../libgomp/config/linux/wait.h:64
#2  #pragma omp barrier () at parallel-demo.c:10
#4  #parallel zone #1 of main () at parallel-demo.c:10

Many frames are gone, that's my cleanup, some function names have been changed, that's my OpenMP work ... but the function name of frame #0 and #1 are inconsistent. It should not read gomp_team_barrier_wait_end but rather do_spin and do_wait, respectively.

I don't know what's special about these functions, they're inlined, but that's not enough to explain and recreate the problem ...

Anyway, I found that the inconsistency boils down to two lines:

(gdb) frame 0
(gdb) pi print(gdb.selected_frame().function())
gomp_team_barrier_wait_end # wrong
(gdb) pi print(gdb.selected_frame().name())    
do_spin # right

So to solve my problem, I add a frame decorator that picks up the frame name instead of its function symbol:

class BugFixFrame(gdb.frames.FrameDecorator):
    def function(self): 
        return self.inferior_frame().name() 

class BugFixFrameFilter:
    def __init__(self):
        self.enabled = True
        self.priority = 99999

    def filter(self, frames):
        for frame in frames:
            yield BugFixFrame(frame)

gdb.frame_filters["Bug fix frame filter"] = BugFixFrameFilter()

and I now have my clean and correct callstack:

(gdb) where                                                          
#0  do_spin (val=0, addr=0x602104) at .../libgomp/config/linux/wait.h:55
#1  do_wait (val=0, addr=0x602104) at .../libgomp/config/linux/wait.h:64
#2  #pragma omp barrier () at parallel-demo.c:10
#4  #parallel zone #1 of main () at parallel-demo.c:10

I've submitted a bug report as PR/19225.

Looking up Source-Code Lines from GDB/Python (and OpenMP complications)

Monday, November 02, 2015 - No comments

In GDB Python bindings, there is currently no direct way to translate a function symbol into its source file and corresponding lines. But that's possible with gdb command-line, and some more jungling:

(gdb) disassemble [function_name|*address]
Dump of assembler code for function .omp_ptask.:
   0x00000000004024a0 <+0>: push   %rbp
   0x00000000004024a1 <+1>: mov    %rsp,%rbp
   0x00000000004024a4 <+4>: sub    $0x20,%rsp
   0x00000000004024a8 <+8>: mov    %edi,-0x8(%rbp)
   0x00000000004024ab <+11>:    mov    %rsi,-0x10(%rbp)
   0x00000000004024af <+15>:    mov    -0x8(%rbp),%edi
   0x00000000004024b2 <+18>:    mov    %edi,-0x14(%rbp)
   0x00000000004024b5 <+21>:    mov    -0x10(%rbp),%rsi
   0x00000000004024b9 <+25>:    mov    (%rsi),%rsi
   0x00000000004024bc <+28>:    mov    (%rsi),%rdi
   0x00000000004024bf <+31>:    callq  0x4009f0 <foo>
=> 0x00000000004024c4 <+36>:    mov    -0x4(%rbp),%eax
   0x00000000004024c7 <+39>:    add    $0x20,%rsp
   0x00000000004024cb <+43>:    pop    %rbp
   0x00000000004024cc <+44>:    retq   
End of assembler dump.

With disassemble, we know (gdb tells us) where a function begins and ends ... in memory. In theory, we just have to parse the second and penultimate lines of gdb.execute("disassemble {addr}"). But in practise, compilers may reorganize (for optimization) the binary instructions, so it's safer to iterate through all of them. Then, gdb.find_pc_line(pc) tells use the source-code line matching that PC. There we are:

 def get_function_fname_and_lines(fct_symb):
     fct_addr = long(fct_symb.value().address)
     disa = gdb.execute("disassemble {}".format(fct_addr), to_string=True)

     filename = fct_symb.symtab.filename
     from_line = fct_symb.line
     to_line = 0
     for disa_line in disa.split("\n"):
         if "Dump of assembler code" in disa_line:
             continue # skip first line
         if "End of assembler dump." in disa_line:
             break # we're at the end
         try:
             # parse the PC value
             # => 0x004009c1 <+32>: jmpq   0x401464 <main._omp_fn.0+2755>
             pc = int(disa_line.replace("=>", "").split()[0], 16)
         except:
             log.warning("Could not parse disassembly line ...")
             log.warning(disa_line)
             continue

         sal = gdb.find_pc_line(pc)
         if not sal:
             continue # hum, nothing known that that PC


         # check for consistency that PC is in the right file
         if not sal.symtab.filename == fct_symb.symtab.filename:
             log.info("not the right file, inlined ?")
             continue

         # if function symbol doesn't specify its line
         if fct_symb.line == 0:
             if from_line == 0 or sal.line < from_line:
                 from_line = sal.line

         # PCs may not be in order
         if sal.line > to_line:
             to_line = sal.line

     return filename, from_line, to_line

which gives:

(gdb) print(get_function_fname_and_lines(gdb.lookup_symbol("main")[0]))
('minimal_omp_threads.c', 26, 76)

OpenMP complications

I wrote that function as part of my work on OpenMP (OMP) debugging. In OMP, compilers do "outlining", that is, the reverse of inlining:

#pragma omp task
    foo1(&i, &j, &k);

becomes with GCC/GOMP:

main._omp_fn.3 (...) {
    foo1 (...);
}

GOMP_task (main._omp_fn.3, ...);

Everything is okay here, my code works well. But with Intel OpenMP and LLVM/Clang, they didn't implement outlining the same way: instead of naming the outlined functions something like <parent>._omp_fn.<id>, they name them ... .omp_microtask.! Thanks guys, now gdb.lookup_symbol(".omp_microtask.") always returns the same symbol (certainly the first one), and so does my source-code lookup function.

We do have the address of the function

(Pdb) print fn
0x402340 <.omp_ptask.>

but gdb.lookup_symbol cannot do the lookup by address ...

So let's get back to GDB/Python documentation and see how we can fix that:

Function: gdb.lookup_symbol (name [, block [, domain]])

This function searches for a symbol by name. The search scope can be restricted to the parameters defined in the optional domain and block arguments.

That block argument looks good (that's more or less the equivalent of a C scope). But where show I get it from?

I remember that [gdb.Frame](https://sourceware.org/gdb/current/onlinedocs/gdb/Frames-In-Python.html#Frames-In-Python) has a block attribute:

(gdb) pi gdb.lookup_symbol(".omp_microtask.", gdb.selected_frame().block())[0]
<gdb.Symbol object at 0x7fc96e0883c8>
(gdb) pi get_function_fname_and_lines(...)
('minimal_omp_threads.c', 38, 39)

but that doesn't work as I wanted (that is, from the task allocator function), because we are in the scope of the task allocator function, which is here equivalent to the global one. The lookup always resolves to the first task ...

So, how to get the right block? Let's get back to the documentation, maybe the block page ...

Function: gdb.block_for_pc (pc)

Return the innermost gdb.Block containing the given pc value. If the block cannot be found for the pc value specified, the function will return None.

Interesting ! Furthermore:

Variable: Block.function

The name of the block represented as a gdb.Symbol. If the block is not named, then this attribute holds None. This attribute is not writable.

For ordinary function blocks, the superblock is the static block. However, you should note that it is possible for a function block to have a superblock that is not the static block – for instance this happens for an inlined function.

Indeed:

(Pdb) gdb.block_for_pc (0x402340).function
<gdb.Symbol object at 0x7f824e346300> (.omp_ptask.)

so the final code for Intel OpenMP looks like that:

fct_addr = ... # "0x402340"
fct_symb = gdb.block_for_pc(int(fct_addr, 16)).function
my_gdb.get_function_fname_and_lines(fct_symb)

and that works well :-)

Bug with multiple threads running *inside* GDB

Friday, September 04, 2015 - No comments

When extending through its Python API (or directly in C), the situation where you have to use threads may pop up. For instance with GUIs, or when using another library or module.

Unfortunately, GDB doesn't like that much. First of all, you cannot call GDB Python functions from another thread. GDB itself not multithreaded, and hence not thread safe. Python is though, so you should be able to block the main thread in Python, and call GDB functions in the other thread, but outside from that, GDB will simply crash! And there is nothing to do against that, as far as I know.

But worth (kind of), GDB doesn't support that your code spawns a thread.

TL;DR: solution

In C (GDB bug #17247, patch and discussion):

sigemptyset (&sigchld_mask);
sigaddset (&sigchld_mask, SIGCHLD);
sigprocmask (SIG_BLOCK, &sigchld_mask, &prev_mask);

scm_with_guile (call_initialize_gdb_module, NULL);
sigprocmask (SIG_SETMASK, &prev_mask, NULL);

In Python:

import pysigset, signal

with pysigset.suspended_signals(signal.SIGCHLD):
    # start threads, they will inherit the signal mask
    pass

Description of the bug

When you create (in Python) a thread, and then run the application (in my case it happens mainly when the application itself spawns threads), GDB freezes with the following callstack:

(gdb) where
#0 sigsuspend () from /usr/lib/libc.so.6
#1 wait_lwp (lp=lp@entry=0x21f63b0) at ../../gdb/gdb/linux-na
#2 stop_wait_callback (lp=0x21f63b0, data=<optimized out>) at
#3 iterate_over_lwps (filter=..., callback=callback@entry=0x4
#4 linux_nat_wait_1 (ops=<optimized out>, target_options=1, o
#5 linux_nat_wait (ops=<optimized out>, ptid=..., ourstatus=0
#6 thread_db_wait (ops=<optimized out>, ptid=..., ourstatus=0
...

Explanation of the bug

The function in which GDB is blocked is sigsuspend:

NAME

sigsuspend, rt_sigsuspend - wait for a signal

SYNOPSIS

int sigsuspend(const sigset_t * mask);

DESCRIPTION

sigsuspend() temporarily replaces the signal mask of the calling process with the mask given by mask and then suspends the process until delivery of a signal whose action is to invoke a signal handler or to terminate a process.

GDB uses this function to wait for new events from the application execution: when something occurs in the debuggee (see how debuggers work), the kernel will inform GDB of it by sending a SIGCHLD signal. When it's received, GDB awakes and check what happened.

However, the signal is delivered to GDB process, but not necessarily to its main thread. And it practise, it occurs often that it's delivered to the second thread, who doesn't care about it (that's the default behavior), and continues its life as if nothing occurred.

Solution of the problem

We cannot change the behavior of the thread. However, we have a bit of control over its default signal handling behavior: it is inherited from its parent! So, in Python, we can go this way:

import pysigset, signal

# with SIGCHLD blocked,
with pysigset.suspended_signals(signal.SIGCHLD):
    # start threads,
    # they will inherit the signal mask
    pass
# SIGCHLD is unblocked after the with statement,
# so that GDB can operate properly afterwards

GDB/Python: Executing Code Upon Events

Tuesday, May 12, 2015 - No comments

When I script GDB to develop model-centric debugging support, I often need to execute code upon specific events, namely after breakpoints hits. It looks like that:

import gdb

class MyBP(gdb.Breakpoint):
    def __init__(self):
        gdb.Breakpoint.__init__(self, "a_function")
        self.silent = True

    def stop(self):
        print "##### breakpoint"
        return True # stop the execution at this point

MyBP()

However, in this Breakpoint::stop callback, you're not free to do what ever you want:

Thou shalt not:

  • alter the execution state of the inferior (i.e., step, next, etc.),
  • alter the current frame context (i.e., change the current active frame), or
  • alter, add or delete any breakpoint.

As a general rule, you should not alter any data within gdb or the inferior at this time.

That's a "general rule" in the documentation, however it's not enforced at runtime, so in practice, there are many things you can do, but the result in not guaranteed!

What can go wrong?

I sometimes have to delete breakpoint, and I thought that gdb.post_event(this_bp.delete) would be safe. And it is, if your GDB is configured with set height 0. You don't see the link? Fair enough, it's very tricky! set height 0 tells GDB not to stop when the screen is full, which is an artifact of the past.

When GDB does stop at the end of the window, it processes the "posted events", and actually deletes the breakpoint structure. However the breakpoint is not removed from all GDB low-level lists, and than leads to a segmentation fault when GDB tries to access it. Here are the stack traces (deletion and invalid access) from Valgrind that helped me to figure out what was happening:

 Address 0x18c0bd40 is 0 bytes inside a block of size 200 free'd  # stack of the free(breakpoint)
   at 0x4A07577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x763558: xfree (common-utils.c:98) # here the breakpoint structure is freed
   by 0x58D49D: delete_breakpoint (breakpoint.c:14074)
   by 0x525F92: bppy_delete_breakpoint (py-breakpoint.c:287) # that's the breakpoint delete function
   by 0x3BB384A0D2: PyObject_Call (in /usr/lib64/libpython2.7.so.1.0)
   by 0x3BB38DC026: PyEval_CallObjectWithKeywords (in /usr/lib64/libpython2.7.so.1.0)
   by 0x522C9C: gdbpy_run_events (python.c:934) # process Python events
   by 0x4AC258: run_async_handler_and_reschedule (ser-base.c:137)
   by 0x4AC328: fd_event (ser-base.c:182)
   by 0x61C281: handle_file_event (event-loop.c:762)
   by 0x61B768: process_event (event-loop.c:339)
   by 0x61B80A: gdb_do_one_event (event-loop.c:391)

Invalid read of size 8 # stack of the illegal memory access
   at 0x57C2B9: bpstat_explains_signal (breakpoint.c:4430) # here the breakpoint structure is accessed
   by 0x5FCFE3: handle_signal_stop (infrun.c:4474)
   by 0x5FC484: handle_inferior_event (infrun.c:4110)
   by 0x5FA831: fetch_inferior_event (infrun.c:3261)
   by 0x61E506: inferior_event_handler (inf-loop.c:57)
   by 0x4BC10E: handle_target_event (linux-nat.c:4448)
   by 0x61C281: handle_file_event (event-loop.c:762)
   by 0x61B768: process_event (event-loop.c:339)
   by 0x61B80A: gdb_do_one_event (event-loop.c:391)
   by 0x727F0D: maybe_wait_sync_command_done (top.c:386)
   by 0x728199: execute_command (top.c:478)
   by 0x6148E7: catch_command_errors (main.c:373)

Among the other thing that cannot be done (and that I wanted to do), we find thread switching:

def stop(self):
    print("##### breakpoint")
    gdb.selected_inferior().threads()[1].switch()
    gdb.selected_inferior().threads()[0].switch()
    return False # don't stop

which leads to another segfault:

[New Thread 0x7ffff7fca700 (LWP 22661)]
##### breakpoint
[Thread 0x7ffff7fca700 (LWP 22661) exited]
[Inferior 1 (process 22657) exited normally]
[2]    22651 segmentation fault (core dumped)  gdb-fedora -ex "source test.py" ./thread -ex run

or altering the instruction pointer of the inferior:

[New Thread 0x7ffff7fca700 (LWP 23100)]
##### breakpoint
Traceback (most recent call last):
  File "test.py", line 11, in stop
    gdb.execute("next")
gdb.error: Cannot execute this command while the selected thread is running.

Alternatives

But what if you really what to do it ?! Then you have to find alternatives! and there are a few (but they assume you want to stop the execution at this breakpoint, and give the prompt back to the user).

class MyBP(gdb.Breakpoint):
    def stop(self):
        print("##### breakpoint")
        gdb.events.stop.connect(stop_event)
        gdb.prompt_hook = prompt
        gdb.post_event(posted_event)
        return True

Subscribing to the stop events

def stop_event(evt):
    print("#### stop event")
    gdb.events.stop.disconnect(stop_event)

Hooking the prompt:

def prompt(current):
    print("#### prompt")
    gdb.prompt_hook = current

Posting events (which seem to work well when height=0)

def posted_event():
    print("#### posted event")

These callbacks are executed in this order:

##### breakpoint
#### stop event
#### prompt
(gdb) #### posted event

Note that the prompt hook is executed right before giving the control to the user, so gdb.execute("<command>") works very close to what you can expect from the command line. That saved the work of my afternoon yesterday!