In GDB Python bindings, there is currently no direct way to translate
a function symbol into its source file and corresponding lines. But
that's possible with gdb command-line, and some more jungling:
(gdb) disassemble [function_name|*address]
Dump of assembler code for function .omp_ptask.:
0x00000000004024a0 <+0>: push %rbp
0x00000000004024a1 <+1>: mov %rsp,%rbp
0x00000000004024a4 <+4>: sub $0x20,%rsp
0x00000000004024a8 <+8>: mov %edi,-0x8(%rbp)
0x00000000004024ab <+11>: mov %rsi,-0x10(%rbp)
0x00000000004024af <+15>: mov -0x8(%rbp),%edi
0x00000000004024b2 <+18>: mov %edi,-0x14(%rbp)
0x00000000004024b5 <+21>: mov -0x10(%rbp),%rsi
0x00000000004024b9 <+25>: mov (%rsi),%rsi
0x00000000004024bc <+28>: mov (%rsi),%rdi
0x00000000004024bf <+31>: callq 0x4009f0 <foo>
=> 0x00000000004024c4 <+36>: mov -0x4(%rbp),%eax
0x00000000004024c7 <+39>: add $0x20,%rsp
0x00000000004024cb <+43>: pop %rbp
0x00000000004024cc <+44>: retq
End of assembler dump.
With disassemble, we know (gdb tells us) where a function begins and
ends ... in memory. In theory, we just have to parse the second and
penultimate lines of gdb.execute("disassemble {addr}"). But in
practise, compilers may reorganize (for optimization) the binary
instructions, so it's safer to iterate through all of them. Then,
gdb.find_pc_line(pc) tells use the source-code line matching that
PC. There we are:
def get_function_fname_and_lines(fct_symb):
fct_addr = long(fct_symb.value().address)
disa = gdb.execute("disassemble {}".format(fct_addr), to_string=True)
filename = fct_symb.symtab.filename
from_line = fct_symb.line
to_line = 0
for disa_line in disa.split("\n"):
if "Dump of assembler code" in disa_line:
continue # skip first line
if "End of assembler dump." in disa_line:
break # we're at the end
try:
# parse the PC value
# => 0x004009c1 <+32>: jmpq 0x401464 <main._omp_fn.0+2755>
pc = int(disa_line.replace("=>", "").split()[0], 16)
except:
log.warning("Could not parse disassembly line ...")
log.warning(disa_line)
continue
sal = gdb.find_pc_line(pc)
if not sal:
continue # hum, nothing known that that PC
# check for consistency that PC is in the right file
if not sal.symtab.filename == fct_symb.symtab.filename:
log.info("not the right file, inlined ?")
continue
# if function symbol doesn't specify its line
if fct_symb.line == 0:
if from_line == 0 or sal.line < from_line:
from_line = sal.line
# PCs may not be in order
if sal.line > to_line:
to_line = sal.line
return filename, from_line, to_line
which gives:
(gdb) print(get_function_fname_and_lines(gdb.lookup_symbol("main")[0]))
('minimal_omp_threads.c', 26, 76)
OpenMP complications
I wrote that function as part of my work on OpenMP (OMP) debugging. In
OMP, compilers do "outlining", that is, the reverse of inlining:
#pragma omp task
foo1(&i, &j, &k);
becomes with GCC/GOMP:
main._omp_fn.3 (...) {
foo1 (...);
}
GOMP_task (main._omp_fn.3, ...);
Everything is okay here, my code works well. But with Intel OpenMP and
LLVM/Clang, they didn't implement outlining the same way: instead of
naming the outlined functions something like <parent>._omp_fn.<id>,
they name them ... .omp_microtask.! Thanks guys, now
gdb.lookup_symbol(".omp_microtask.") always returns the same symbol
(certainly the first one), and so does my source-code lookup function.
We do have the address of the function
(Pdb) print fn
0x402340 <.omp_ptask.>
but gdb.lookup_symbol cannot do the lookup by address ...
So let's get back to GDB/Python
documentation
and see how we can fix that:
Function: gdb.lookup_symbol (name [, block [, domain]])
This function searches for a symbol by name. The search scope can
be restricted to the parameters defined in the optional domain
and block arguments.
That block argument looks good (that's more or less the equivalent of
a C scope). But where show I get it from?
I remember that [gdb.Frame](https://sourceware.org/gdb/current/onlinedocs/gdb/Frames-In-Python.html#Frames-In-Python) has a block attribute:
(gdb) pi gdb.lookup_symbol(".omp_microtask.", gdb.selected_frame().block())[0]
<gdb.Symbol object at 0x7fc96e0883c8>
(gdb) pi get_function_fname_and_lines(...)
('minimal_omp_threads.c', 38, 39)
but that doesn't work as I wanted (that is, from the task allocator
function), because we are in the scope of the task allocator function,
which is here equivalent to the global one. The lookup always resolves
to the first task ...
So, how to get the right block? Let's get back to the documentation,
maybe the block
page
...
Function: gdb.block_for_pc (pc)
Return the innermost gdb.Block containing the given pc value. If
the block cannot be found for the pc value specified, the
function will return None.
Interesting ! Furthermore:
Variable: Block.function
The name of the block represented as a gdb.Symbol. If the block
is not named, then this attribute holds None. This attribute is
not writable.
For ordinary function blocks, the superblock is the static
block. However, you should note that it is possible for a
function block to have a superblock that is not the static block
– for instance this happens for an inlined function.
Indeed:
(Pdb) gdb.block_for_pc (0x402340).function
<gdb.Symbol object at 0x7f824e346300> (.omp_ptask.)
so the final code for Intel OpenMP looks like that:
fct_addr = ... # "0x402340"
fct_symb = gdb.block_for_pc(int(fct_addr, 16)).function
my_gdb.get_function_fname_and_lines(fct_symb)
and that works well :-)