(gdb) break *0x972

Debugging, GNU± Linux and WebHosting and ... and ...

Looking up Source-Code Lines from GDB/Python (and OpenMP complications)

In GDB Python bindings, there is currently no direct way to translate a function symbol into its source file and corresponding lines. But that's possible with gdb command-line, and some more jungling:

(gdb) disassemble [function_name|*address]
Dump of assembler code for function .omp_ptask.:
   0x00000000004024a0 <+0>: push   %rbp
   0x00000000004024a1 <+1>: mov    %rsp,%rbp
   0x00000000004024a4 <+4>: sub    $0x20,%rsp
   0x00000000004024a8 <+8>: mov    %edi,-0x8(%rbp)
   0x00000000004024ab <+11>:    mov    %rsi,-0x10(%rbp)
   0x00000000004024af <+15>:    mov    -0x8(%rbp),%edi
   0x00000000004024b2 <+18>:    mov    %edi,-0x14(%rbp)
   0x00000000004024b5 <+21>:    mov    -0x10(%rbp),%rsi
   0x00000000004024b9 <+25>:    mov    (%rsi),%rsi
   0x00000000004024bc <+28>:    mov    (%rsi),%rdi
   0x00000000004024bf <+31>:    callq  0x4009f0 <foo>
=> 0x00000000004024c4 <+36>:    mov    -0x4(%rbp),%eax
   0x00000000004024c7 <+39>:    add    $0x20,%rsp
   0x00000000004024cb <+43>:    pop    %rbp
   0x00000000004024cc <+44>:    retq   
End of assembler dump.

With disassemble, we know (gdb tells us) where a function begins and ends ... in memory. In theory, we just have to parse the second and penultimate lines of gdb.execute("disassemble {addr}"). But in practise, compilers may reorganize (for optimization) the binary instructions, so it's safer to iterate through all of them. Then, gdb.find_pc_line(pc) tells use the source-code line matching that PC. There we are:

 def get_function_fname_and_lines(fct_symb):
     fct_addr = long(fct_symb.value().address)
     disa = gdb.execute("disassemble {}".format(fct_addr), to_string=True)

     filename = fct_symb.symtab.filename
     from_line = fct_symb.line
     to_line = 0
     for disa_line in disa.split("\n"):
         if "Dump of assembler code" in disa_line:
             continue # skip first line
         if "End of assembler dump." in disa_line:
             break # we're at the end
         try:
             # parse the PC value
             # => 0x004009c1 <+32>: jmpq   0x401464 <main._omp_fn.0+2755>
             pc = int(disa_line.replace("=>", "").split()[0], 16)
         except:
             log.warning("Could not parse disassembly line ...")
             log.warning(disa_line)
             continue

         sal = gdb.find_pc_line(pc)
         if not sal:
             continue # hum, nothing known that that PC


         # check for consistency that PC is in the right file
         if not sal.symtab.filename == fct_symb.symtab.filename:
             log.info("not the right file, inlined ?")
             continue

         # if function symbol doesn't specify its line
         if fct_symb.line == 0:
             if from_line == 0 or sal.line < from_line:
                 from_line = sal.line

         # PCs may not be in order
         if sal.line > to_line:
             to_line = sal.line

     return filename, from_line, to_line

which gives:

(gdb) print(get_function_fname_and_lines(gdb.lookup_symbol("main")[0]))
('minimal_omp_threads.c', 26, 76)

OpenMP complications

I wrote that function as part of my work on OpenMP (OMP) debugging. In OMP, compilers do "outlining", that is, the reverse of inlining:

#pragma omp task
    foo1(&i, &j, &k);

becomes with GCC/GOMP:

main._omp_fn.3 (...) {
    foo1 (...);
}

GOMP_task (main._omp_fn.3, ...);

Everything is okay here, my code works well. But with Intel OpenMP and LLVM/Clang, they didn't implement outlining the same way: instead of naming the outlined functions something like <parent>._omp_fn.<id>, they name them ... .omp_microtask.! Thanks guys, now gdb.lookup_symbol(".omp_microtask.") always returns the same symbol (certainly the first one), and so does my source-code lookup function.

We do have the address of the function

(Pdb) print fn
0x402340 <.omp_ptask.>

but gdb.lookup_symbol cannot do the lookup by address ...

So let's get back to GDB/Python documentation and see how we can fix that:

Function: gdb.lookup_symbol (name [, block [, domain]])

This function searches for a symbol by name. The search scope can be restricted to the parameters defined in the optional domain and block arguments.

That block argument looks good (that's more or less the equivalent of a C scope). But where show I get it from?

I remember that [gdb.Frame](https://sourceware.org/gdb/current/onlinedocs/gdb/Frames-In-Python.html#Frames-In-Python) has a block attribute:

(gdb) pi gdb.lookup_symbol(".omp_microtask.", gdb.selected_frame().block())[0]
<gdb.Symbol object at 0x7fc96e0883c8>
(gdb) pi get_function_fname_and_lines(...)
('minimal_omp_threads.c', 38, 39)

but that doesn't work as I wanted (that is, from the task allocator function), because we are in the scope of the task allocator function, which is here equivalent to the global one. The lookup always resolves to the first task ...

So, how to get the right block? Let's get back to the documentation, maybe the block page ...

Function: gdb.block_for_pc (pc)

Return the innermost gdb.Block containing the given pc value. If the block cannot be found for the pc value specified, the function will return None.

Interesting ! Furthermore:

Variable: Block.function

The name of the block represented as a gdb.Symbol. If the block is not named, then this attribute holds None. This attribute is not writable.

For ordinary function blocks, the superblock is the static block. However, you should note that it is possible for a function block to have a superblock that is not the static block – for instance this happens for an inlined function.

Indeed:

(Pdb) gdb.block_for_pc (0x402340).function
<gdb.Symbol object at 0x7f824e346300> (.omp_ptask.)

so the final code for Intel OpenMP looks like that:

fct_addr = ... # "0x402340"
fct_symb = gdb.block_for_pc(int(fct_addr, 16)).function
my_gdb.get_function_fname_and_lines(fct_symb)

and that works well :-)