(gdb) break *0x972

GDB, please set a breakpoint on all my functions

Friday, February 10, 2017 - No comments

For studying a piece of code, a colleague of mine wanted to set a breakpoint on all the functions of its applications. All but those in shared libraries, so not the libc, libpthread, etc. We agreed that this means all the functions defined in files compiled with -g.

TL;DR;

==> breakpoint_all.py <==

and from GDB:

(gdb) source breakpoint_all.py

I don't claim it's the best way to do, nor the most efficient tracing, I just highlight a possibility of gdb+python. Feel free to update the last step (or the others) to let it do what ever you need!

In the command line, GDB can list the functions with info functions:

(gdb) info functions
File ../../gdb/gdb/compile/compile.c:
    void _initialize_compile(void);
    int compile_register_name_demangle(struct gdbarch *, const char *);
    char *compile_register_name_mangled(struct gdbarch *, int);
    void eval_compile_command(struct command_line *, char *, enum compile_i_scope_types);
    static int check_raw_argument(char **);

File ../../gdb/gdb/complaints.c:
    void _initialize_complaints(void);
    void clear_complaints(struct complaints **, int, int);
    void complaint(struct complaints **, const char *, ...);
    void internal_complaint(struct complaints **, const char *, int, const char *, ...);

But at the moment (gdb 7.12.1) there is no equivalent in Python, so I had to find another way. I didn't want to parse this command, and create a breakpoint out of these function names, I wanted GDB Python to give me this list, so I had to find another way ...

List source files

Again there is a CLI command (info sources), again there is no Python wrapper:

(gdb) info sources
Source files for which symbols have been read in:

gdb/amd64-tdep.c, gdb/features/i386/amd64.c, gdb/features/i386/amd64-avx.c, gdb/gdbarch.h, gdb/target.h, ...

Source files for which symbols will be read in on demand:

gdb/gdb/features/i386/amd64.c, gdb/features/i386/x32.c, ...

I don't why there are two sets, but it's these files we are interested in ... so let's parse this output:

def get_file_addresses():
    sources = gdb.execute("info sources", to_string=True).split("\n")
    assert "Source files for which symbols have been read in" in sources[0]

    for line in sources:
        if line.startswith("Source files ") or not line.strip():
            continue
        for source in line.split(", "):
            # do something with `source`
            ...

Find source file memory addresses

Now we need to list the symbols in this files. Sounds easy, but in practice there is no function to lookup a symbol file from its name, and even with the symbol files (tables), there is no way to list the symbol it contains. So this is a dead end!

After a careful reading of the documentation, one finds out that only blocks contain a list of symbols, when you iterate over it. This is not written in bold, so easy to miss :-(. Anyway, we need to get the block corresponding to our source file scope. We could lookup it up from its PC: gdb.block_for_pc(pc), but at the moment we don't know a PC corresponding to this file.

One way to get this PC is by setting a breakpoint in this file: break <source file>:1 should work, and we'll be able to use the address of the breakpoint as a block lookup PC.

(gdb) b gdb/amd64-tdep.c:1
Breakpoint 1 at 0x461300: file ../../gdb/gdb/amd64-tdep.c, line 1.

But again, there is no way to do this in Python :-( So let's parse again the CLI output!

Pay attention to error though, if GDB could not convert the location into an address it will through a gdb.error exception, or if set breakpoint pending is on (in my system for instance), it will not fail, but indicate that the breakpoint is pending. Setting multiple breakpoint at the same address may also change the output.

def get_file_addresses():
        ....
        # do something with `source`
        try:
            # fails if source is a header file
            bpt_msg = gdb.execute("break {}:1".format(source), to_string=True)
        except gdb.error as e:
            # if show breakpoint pending ==> off
            continue
        bp_id = bpt_msg.partition("Breakpoint ")[-1].partition(" ")[0]
        gdb.execute("delete {}".format(bp_id), to_string=True)

        if "pending" in bpt_msg or "No line" in bpt_msg:
            # if show breakpoint pending ==> on
            """No line 1 in file "/usr/include/bits/pthreadtypes.h".
            Breakpoint 8 (/usr/include/bits/pthreadtypes.h:1) pending."""
            continue

        """Note: breakpoint 3 also set at pc 0x461300.
        Breakpoint 2 at 0x461300: file ../../gdb/gdb/amd64-tdep.c, line 1."""

        bp_line = [a for a in bpt_msg.split("\n") if a.startswith("Breakpoint ")][0]
        file_1st_addr = int(bp_line.split(" ")[3][:-1], 16) # change to long in Py2 IIRC

        yield source, file_1st_addr

List all the symbols/functions from a file

Now we've got the address on a file, we can get its global scope (Block.global_block) and list the symbols it contains. In this list, we just trim out what isn't a function:

def get_all_functions_from_pc(pc):
    block = gdb.block_for_pc(pc)
    for symb in block.global_block:
        if not symb.is_function: continue

        yield symb

And we're almost there, just need to assemble the different pieces:

for source, pc in get_file_addresses():
        print("{} ==> {}".format(source, hex(pc)))
        for fct_symb in get_all_functions_from_pc(pc):
             # do something with fct_symb
            ...

Set a (trace) breakpoint on every functions

Last step, what do we want to do with this symbol: here, just some simple tracing each time the function is called:

class TraceBreakpoint(gdb.Breakpoint):
    def __init__(self, symb):
        addr = int(symb.value().address)

        gdb.Breakpoint.__init__(self, "*{}".format(hex(addr)), internal=True)
        self.silent = True

    def stop(self):
        caller = gdb.newest_frame().older()
        caller_name = caller.name() if caller else 'none'

        print('{};{};{}'.format(gdb.selected_thread().num, caller_name, gdb.newest_frame().name()))

        return False # we never want to stop

which gives this last function:

def set_trace_bpt_on_all_symbols():
    for source, pc in get_file_addresses():
        print("{} ==> {}".format(source, hex(pc)))
        for fct_symb in get_all_functions_from_pc(pc):
            bpt = TraceBreakpoint(fct_symb)
            print("\t{} (Bpt #{})".format(fct_symb, bpt.number))

Load it

==> breakpoint_all.py <==

and from GDB:

(gdb) source breakpoint_all.py

Multithreaded warning

We noticed that in multithreaded environments, if one thread calls a breakpointed function very often, the other won't have time to progress. Consider running in async mode to stop only the thread that hit the breakpoint:

(gdb) set non-stop on

GDB scheduler locking, function calls and multi-threading

Tuesday, February 07, 2017 - 2 comments

GDB can call functions in the program's address-space. That can be useful in different sitations, for debugging or just informal code testing. Let's say I want to allocate some memory:

(gdb) p/x (char *)malloc(1024)
$3 = 0x60c960

I can do what ever I want with this buffer:

(gdb) set $b = $1
(gdb) set $b[0] = 48
(gdb) set $b[1] = 120
(gdb) set $b[2] = 57
(gdb) set $b[3] = 55
(gdb) set $b[4] = 50
(gdb) set $b[5] = 0
(gdb) p $b
$3 = 0x60c960 "0x972"
(gdb) call puts($b)
0x972

... but that's not the point of this post. What I wanted to highlight here is want happens in a multithreaded execution, if I do something similar:

(gdb) break 350
(gdb) run
[... many threads are created here ...]
[Switching to Thread ... (LWP 20645)]
Breakpoint 1, .... at ...:350
(gdb) p malloc(1024)
[Switching to Thread ... (LWP 20592)]
Breakpoint 1, .... at ...:350
The program stopped in another thread while making a function call from GDB.
Evaluation of the expression containing the function
(malloc) will be abandoned.
When the function is done executing, GDB will silently stop.

The problem here is that another thread hits the breakpoint during the execution of malloc. This is the default behavior of GDB, and it may be surprising when you're not used to it ...

But in addition to being suprising, it is problematic because the return value of malloc is lost! And the same can happend with next, step, finish, etc.

So if you want to force a single-thread behavior (that is, everytime you're sure that there should't be any deadlock during the function call), set GDB's scheduler-locking.

With the scheduler-locking enabled, GDB only let the current thread run. The execution of the other one is blocked in Linux scheduler.

(In practise, this is implemented in ptrace API, to which GDB either passes the process PID or only the thread TID.)

(gdb) set scheduler-locking on
(gdb) p malloc(1024)
...
(gdb) set scheduler-locking off

Don't forget to turn it back off again afterwards, or your thread will certainly block soon because of someone else's mutex!

In Python, I have a simple wrapper that does that for me:

def set_parameter(name, value):
    class temp:
        def __init__(self):
            self.old = None

        def __enter__(self):
            self.old = gdb.parameter(name)

            if self.old is True: self.old = "on"
            if self.old is False: self.old = "off"
            # otherwise don't change

            gdb.execute("set {} {}".format(name, value), to_string=True)

        def __exit__(self, type, value, traceback):
            gdb.execute("set {} {}".format(name, self.old), to_string=True)
    return temp()

I use it this way:

with set_parameter("scheduler-locking", True):
  buffer = gdb.parse_and_eval("malloc")(1024)

i3 named workspaces

Monday, September 19, 2016 - No comments

I recently improved my i3-wm configuration regarding the handling of workspaces. The idea is to have a quicker access to Firefox and Emacs, the main GUI tools I use for working. I initially added bindings to Urxvt terminal, but I happen to use it too much, so I cannot bind it to a particular workspace, they have to appear where ever I am :-)

Workspace names

set $wp_web 1: web
set $wp_edit 2: emacs
set $wp_work 3: work

set $wp_1 1: web # cannot use $variable here ...
set $wp_2 2: emacs
set $wp_3 3: work

assign [class="Firefox"] ? $wp_web
assign [class="^Emacs$"] ? $wp_edit

Switching workspace

# Win+1, Win+2 ...
bindcode Mod4+10 workspace $wp_1
bindcode Mod4+11 workspace $wp_2
bindcode Mod4+12 workspace $wp_3
bindcode Mod4+13 workspace 4
...

# quick Fn accesses
# F1 is Firefox, always focus to it during quick switches. 
# Same for F2 and emacs
bindsym F1 workspace $wp_1 ; [class="Firefox"] focus 
bindsym F2 workspace $wp_2 ; [class="Emacs"] focus
bindsym F3 workspace $wp_3
bindsym F4 workspace 4
# skip F5 (refresh in Firefox)
...

# quick back-and-forth with special key
bindsym Menu workspace back_and_forth # I would prefer a back-and-forth *on output*

# win-left/right
bindcode Mod4+86 workspace next
bindcode Mod4+82 workspace prev


# Mouse switch 1: left  2: middle  3: right
bindsym --whole-window Mod4+button1 workspace prev_on_output
bindsym --whole-window Mod4+button2 workspace back_and_forth
bindsym --whole-window Mod4+button3 workspace next_on_output

Moving workspaces and windows

# quick move
bindsym Shift+F1 move workspace $wp_1
bindsym Shift+F2 move workspace $wp_2
bindsym Shift+F3 move workspace $wp_3
bindsym Shift+F4 move workspace 4
...

# MOVE+Next/Prior keys to move the workspace
bindsym $Mod_move+Next move workspace next
bindsym $Mod_move+Prior move workspace prev

# Special key (UP/X) to move only window
bindcode $Mod_move+66 move window to output right;focus output right

# F12/F11 to move to hidden scratchpad / show it 
bindsym F12 move scratchpad
bindsym F11 scratchpad show

Simple strace debugging

Tuesday, August 30, 2016 - No comments

Today my URxvt terminals are stuck. All of them, at the same time.

I know they're just clients (urxvtc) for the main daemon (urxvtd) so I'm not surprised that all of them are stuck, but I don't know why.

URxvt standalone version (urxvt) works well, so I can start inverstigating (by the way, in my i3 environment I mainly use URxvt client (bound to Mod+Shift+Enter), because it's faster to start, but I have a backup binding Mod+Ctrl+Enter for the standalone version, just in case things like that happen :).

Then, with the right tool, it's easy to find the problem:

 $ strace -p $(pidof urxvtd)
 strace: Process 1518 attached
 wait4(20063, 
 ^C
strace: Process 1518 detached
 <detached ...>

So PID 20063 is the problem, urxvtd is waiting for it forever. But who is that?

$ ps aux | grep 20063 
kevin    20063  0.0  0.0  13628  2920 ?        S    10:45   0:00 sh -c echo -en toto\.ods | xsel -i -b -p

Alright, it's my clipboard management script that doesn't return ...

$ kill -9 20063

Bye bye clipboard, and hello again my terminals :-)

Looking for a Developer/Researcher Position in April 2017

Monday, May 02, 2016 - No comments

My contract with the University of Grenoble ends at the end of March 2017, so I'm looking for a developer/researcher position in Grenoble area, or remotely, in the field of debugging tools, but also runtime environments, compilers, verification & validation for HPC and embedded systems.

A printer-friendly version of my CV is also available. Non (or more) anonymized versions can also be generated on demand.

Break if outside of section

Tuesday, April 26, 2016 - No comments

Just a little example of how to use gdb.py to create a breakpoint that only fires if called outside of a give section :

int main() {
  srand(time(NULL));

  for (int i = 0; i < 100; i++) {
    if (rand() > THRESHOLD) do_in_section(0);

    start();
    do_in_section(1);
    stop();
  }
}

Functions start() and stop() delimit the section: it is okay to call do_in_section() inside the section. However, every once in a while, do_in_section() is called from outside of this section. (Forget about that boolean flag parameter, it's just for logging and consistency checks. Real-life codes won't have such an easy way to discriminate outside from inside call ... otherwise you don't need an advanced breakpoint!)

(gdb) break do_in_section

will lead to many useless stops, as most of the call to do_in_section() are legal, so what should do is:

set an internal breakpoint on start(), where we set a flag and continue
set an internal breakpoint on stop()` where we unset the flag and continue
set a breakpoint on do_in_section() that checks that flag, and only stop if outside of the section

which gives in Python:

import gdb

in_section = False

class StartStopBreakpoint(gdb.Breakpoint):
    def __init__(self, loc, is_start):
        gdb.Breakpoint.__init__(self, loc, internal=True)
        self.silent = True

        self.is_start = is_start

    def stop(self):
        global in_section

        if self.is_start: # I know that this 'if' is not necessary ...
            assert not in_section
            in_section = True
        else:
            assert in_section
            in_section = False

        return False # never stop here

class SectionBreakpoint(gdb.Breakpoint):
    def __init__(self, location):
        gdb.Breakpoint.__init__(self, location, internal=True)
        self.silent = True

    def stop(self):
        if in_section:
            print("ignore hit in section")
            assert gdb.parse_and_eval("inside") == 1
            return False

        else:
            print("")
            print("Section breakpoint hit outside of section")
            print("")
            assert gdb.parse_and_eval("inside") == 0
            return True

StartStopBreakpoint("start", is_start=True)
StartStopBreakpoint("stop", is_start=False)
SectionBreakpoint("do_in_section")

Just source that file, or input it in GDB python commandline. The three last line instantiate the breakpoints:

$ gdb -ex "source test.py" a.out
GNU gdb (GDB) 7.10.50.20160122-git
...
(gdb) run
Starting program: /home/kevin/a.out 
ignore
ignore ....

Section breakpoint hit outside of section

(gdb) print inside
$1 = 0

Et voila!

Extending Linux Perf Stat with LD_PRELOAD

Friday, March 25, 2016 - No comments

For my current work on monitoring, I need to use Linux perf stat. Perf tools read and dislay the harware counters, either for the wall execution of a process, or while attaced to it:

perf stat --pid $(pidof firefox)
^C
 Performance counter stats for process id '4257':

     13.860180      task-clock (msec)
            79      context-switches           
            16      cpu-migrations
            11      page-faults             
    18,397,934      cycles
    13,964,242      stalled-cycles-frontend 
     9,787,703      stalled-cycles-backend
     8,320,570      instructions
     1,743,632      branches
        93,080      branch-misses

   1.942768382 seconds time elapsed

That's great, I can attach perf to my process, run it for a while and stop it. But if I want to start, and stop, and start again, I can't. And that's what I want to do, from inside gdb.py.

Signals would be great for that, when ever I send a signal, perf dumps the counter values to stderr, and continues. But that's not implemented ...

Studying `perf-stat` source code

So let's see in perf-stat source code what we can find for that.

static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg)
static void print_aggr(char *prefix)

these look like good candidates, but their symbols are not exported ...

(gdb) print abs_printout
No symbol "abs_printout" in current context.
(gdb) print print_aggr
No symbol "print_aggr" in current context.

Next candidate?

static void process_interval(void)
(gdb) p process_interval
$3 = {void (void)} 0x441660 <process_interval>

Oh, great, and with no arguments, that's even easier ! Let's try it:

(gdb) call process_interval()
  3858.     571793.500217      task-clock
  3858          1,880,243      context-switches
  3858            115,610      cpu-migrations           
  3858          8,639,477      page-faults              
  3858  1,430,093,310,944      cycles
  3858  1,037,644,029,921      stalled-cycles-frontend
  3858    756,612,594,751      stalled-cycles-backend 
  3858    820,483,606,606      instructions
  3858    176,772,490,245      branches
  3858      5,454,270,151      branch-misses

Exactly what we were looking for !

Triggering `process_interval()` with a signal

Next, we need to be able to trigger this function remotely, and without modifying perf-stat source code. (The easy option would be to patch perf-stat, but then our tool would be harder to distribute).

That can be done with the help of Linux LD_PRELOAD trick: we preload a bit of code inside perf-stat address space, and during the application initialization, we register the signal handler:

void (*process_internal)(void) = (void *) PROCESS_INTERNAL;

void my_handler(int signum) {
    if (signum != SIGUSR2) {
       return;
    }

    printf("Received SIGUSR2!\n");
    process_internal();
}

void init(void) __attribute__((constructor));

void init(void){
  printf("Received init!\n");
  signal(SIGUSR2, my_handler);
}

Here the (GCC specific) function attribute constructor tells the linker to execute the function when it's loaded. In this constructor, we just register the signal handler, and in the signal handler we call process_internal().

But how do we get the address of process_internal? that's where it is not really really clean: it get it from a lookup of the binary's symbol addresses:

nm -a /usr/bin/perf | grep process_interval
0000000000441660 t process_interval

and we inject it manually in the code:

#define PROCESS_INTERNAL 0x0000000000441660
void (*process_internal)(void) = (void *) PROCESS_INTERNAL;

A better way would be to pass it as an environment variable:

#define PROCESS_INTERNAL ascii_to_long($PROCESS_INTERNAL_ADDR)

and in the shell:

PROCESS_INTERNAL_ADDR=0x$(nm -a /usr/bin/perf | grep process_interval | cut -d" " -f1)

and that works pretty well !

LD_PRELOAD=/path/to/preload.so perf stat -x, -p $PID_TO_PROFILE

kill -USR2 $(pidof perf)

Note that SIGUSR1 does't work, certainly because it's used by perf-stat. And don't forget the -x, argument if you want to parse perf-stat output.

Also, keep in mind that this is not a robust implementation (of course :), as there might be concurrency problems, for instance if the signal is received while perf is updating the counters.

If it ain't broken don't fix it; I'll wait for problems before going any further!

Riddle with Python 2/3 and GDB.py checkpoint-restart

Wednesday, March 09, 2016 - No comments

At the moment, I'm playing with thread checkpoint-restart, implemented somehow like setjmp/longjmp :

REGS = "rax","rbx","rcx","rdx","rsi","rdi","rbp","rsp","r8","r9", ...

def checkpoint(): 
  reg_values = map(gdb.newest_frame().read_register, REGS))
  return reg_values

def restart(reg_values):
  for reg, val in zip(REGS, reg_values):
    gdb.execute("set ${} = {}".format(reg, val), to_string=True)

This a naive implementation, but with a few more tricks, it does work (what I have to do is save/restore rip and rbp for the frame above, I'm not sure why).

However this code works for Python 2, but not for Python 3. Can you find why?

I'll let you guess, the answer is written below in white:

The map function in Python 3 returns a generator, which is lazily evaluated. Thus in Python 3, the registers are actually read ... in the restart function! So, of course, they don't hold the values of the check-point, and hence the restart function does nothing!

Photowall generator

Thursday, February 04, 2016 - No comments

This is a GPL 3.0 Python photowall generator, relying on ImageMagick. It is (and will remain) in "beta testing" version : I use it and it fits my requirements, but there is no guarantee that it won't crash in any different circunstances.

Python developers should be able to fix it and update it fairly easy, although the code is a bit old and could benefit from a face lifting (I wrote it in 2012).

An example of customization for Python developers: Polaroid captions are computed by function photowall.get_file_details. Just rewrite this function to decide what you want to use for caption. Current code (in the exception handler :$) takes the filename without extension and replaces underscores by new lines.

Usage:

 photowall.py <path> <target> [options]

 Arguments:
   <path>        The path where photos are picked up from.
   <target>      The path where the target photo is written. 
                 Except in POLAROID+RANDOM mode, the image will 
                 be blanked out first.

 Options:
   --polaroid      Use polaroid-like images for the wall
   --pick-random   Pick images randomly in the <path> folder.
   --help          Display this message

 Size options:
   --nb-lines <nb>         Number on lines of the target image. 
   --line-height <height>  Set the height of a single image.
   --width <width>         Set final image width. 
   --no-resize    Resize images before putting in the wall.

 Polaroid mode options:
   --crop-size <crop>      Minimum size to allow cropping 
                           an image, if it doesn't fit
   --no-caption            Disable caption.
   --put-random            Put images randomly
                           instead of linearily.
   --sleep <time>          If --put-random, time (in seconds)
                           to go asleep 
                           before adding a new image.

 Collage mode options:
   --do-wrap               Finish images on the next line?

Examples

Polaroid mode with caption

--polaroid  --nb-lines 1 --pick-random

Polaroid mode with caption

Polaroid mode with no caption

--polaroid --no-caption --nb-lines 1 --pick-random

Polaroid mode with no caption

Collage mode

--nb-lines 1 --pick-random

Collage mode

Polaroid random mode

--put-random --nb-lines 1 --pick-random

Polaroid random mode

Graphical interface

I built a graphical interface that should support all the command-line arguments. Use it with caution, I don't like writting GUIs !

GUI

Tricks for Python debugger pdb

Tuesday, February 02, 2016 - No comments

Using pdb in PyGTK applications

If you try to run pdb.set_trace() on a PyGTK application, the command-line prompt will be unusable, certainly because of (native) multithreading. Solution is simple once you got it, but not necessarily easy to find on the Internet:

def import_pdb_set_trace():
  '''Set a tracepoint in PDB that works with Qt'''
  from PyQt5.QtCore import pyqtRemoveInputHook
  pyqtRemoveInputHook()

  import pdb; pdb.set_trace()

PDB aliases [for `gdb.py` scripting]

# gdb.py
alias ge import gdb;gdb.execute("%*") #  gdb execute
alias gq import gdb;gdb.execute("quit") # gdb quit

# force quit
alias fq import os;os._exit(0) 

# quick print
alias pr print(%1)

# dir()
alias dir for k in dir(%1): print("{}".format(k))
alias dirv for k in dir(%1): print("{} --> {}".format(k, getattr(%1, k)))

alias ds dir self

Older entries »