(gdb) break *0x972

Debugging, GNU± Linux and WebHosting and ... and ...

Zététique et debugging: le rasoir d'Occam

Monday, December 01, 2014 - No comments

Aujourd'hui, je fais le lien entre le cours de zététique que j'ai suivi il y a 3 ans (cours de culture générale pour le doctorat), et mon travail de recherche: le rasoir d'Occam :

Ça dit en substance : Pluralitas non est ponenda sine necessitate. Et en compréhensible : Pourquoi faire compliqué quand on peut faire simple ? En gros, ce que dit ce rasoir, c’est que lorsqu’il y a plusieurs hypothèses en compétition, il vaut mieux prendre les moins « coûteuses » cognitivement.

Ou avec une exemple :

Ce coupe-chou peut s’avérer aussi utile pour l’analyse des théories dites du complot. Il n’est pas impossible que le 11 septembre soit le fruit d’une orchestration planifiée par les services secrets, moyennant une grande discrétion des complices, tout un tas de précautions et l’effacement de toutes les preuves, ceci afin de déclarer le combat contre l’Axe du Mal et déclencher la deuxième guerre du golfe. C’est un scénario séduisant, surtout quand on est anti-Bush. Mais un peu de culture historique rend assez coûteuse cette hypothèse.

Appliqué au debugging, on peut aussi trouver des hypothèses « coûteuses » à oublier, comme les bugs du compilateur, de l'OS ou du processeur (ou du débogueur :-).

Et ce que ça donne quand on utilise pas le rasoir: :-)

Kaamelott (Saison 4 Episode 6 – Les pisteurs) © CALT / DIES IRAE / SHORTCOM – 2006

Solving administration problems with debugging tools (strace)

Sunday, November 16, 2014 - No comments

This week, we wanted to setup a printer on a colleague's computer, it worked on CUPS web interface (http://localhost:631), but Gnome control center was freezing when we tried to access the printer configuration.

gnome freeze

How can you get a clue about what's going on?

GDB might be a bit of an overkill, even if your distribution provides you with Gnome's source code and debug information.

But strace can be helpful!

 $ strace gnome-control-center
 execve("/usr/bin/gnome-control-center", ["gnome-control-center"], [/* 37 vars */]) = 0
 brk(0)                                  = 0x1ee9000
 access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
 fstat(3, {st_mode=S_IFREG|0644, st_size=264676, ...}) = 0
 mmap(NULL, 264676, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7486506000
 <click on 'Printers'>

 connect(15, {sa_family=AF_INET, sin_port=htons(631), sin_addr=inet_addr("")}, 16) = -1 EINPROGRESS (Operation now in progress)
 fcntl(15, F_SETFL, O_RDWR)              = 0
 poll([{fd=15, events=POLLIN|POLLOUT}], 1, 250) = 0 (Timeout)
 poll([{fd=15, events=POLLIN|POLLOUT}], 1, 250) = 0 (Timeout)

Here it is, Gnome tries to connect to a network address, and the data polls are timing out. In fact, the colleague had configured it system to connect to the company CUPS server, which was not reachable from our lab, and Gnome tries again and again to connect to this address, unsuccessfully.

To go one step further, and find where Gnome picks this address, you can check what files the program opens:

$ strace -e open,connect gnome-control-center
open("/home/kevin/.cups/client.conf", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/etc/cups/client.conf", O_RDONLY) = 15
open("/home/kevin/.cups/client.conf", O_RDONLY) = -1 ENOENT (No such file or directory)
connect(15, {sa_family=AF_INET, sin_port=htons(631), sin_addr=inet_addr("")}, 16) = -1 EINPROGRESS (Operation now in progress)

Bonne pioche, /etc/cups/client.conf is opened right before the connect call, easy peasy! (but it's not always that simple ;-)

$ cat /etc/cups/client.conf 
# see 'man client.conf'
#ServerName /run/cups/cups.sock #  alternative: ServerName hostname-or-ip-address[:port] of a remote server

(I knew it, I just changed it 5 mins ago to recreate the problem!)

Different problem, same solution, I use open2300 to access the data of my weather station. I usually access it from the raspberry pi that I setup last year, but today I need to connect it to my desktop computer ... and it doesn't work:

$ ./interval2300 0 0
Unable to open serial device /dev/ttyUSB1

indeed, the weather station is on ttyUSB0, not ttyUSB1. Quick and dirty solution is cd /dev; sudo ln -s ttyUSB0 ttyUSB1, but that disappear on reboot (and I asked myself not to create a udev rule for that!). So, I had to understand where open2300 takes that file name: strace, there you go!

$ strace -e open ./interval2300 0 0                                                                                                                                  1 ?
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("open2300.conf", O_RDONLY)         = -1 ENOENT (No such file or directory)
open("/usr/local/etc/open2300.conf", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/etc/open2300.conf", O_RDONLY)    = 3
open("/dev/ttyUSB1", O_RDWR|O_NONBLOCK) = -1 ENOENT (No such file or directory)

Unable to open serial device /dev/ttyUSB1
+++ exited with 1 +++

Tree calls for dynamic libraries, and three for configuration file (current directory, /usr/local/etc, both missing, and finally /etc/open2300.conf is found). Thanks again strace!

Abort early or crash late

Sunday, November 16, 2014 - No comments

Last week I was discussing with a colleague a problem I had with Python's assert statement, that can't be disabled. Let's consider this code snippet that sums up the situation*:

# expects digit as an integer, 
#         unfrequentFlag as a boolean
def fct(digit, unfrequentFlag=False):
  assert type(digit) is int
  print("you gave digit #{}".format(digit))

  if unfrequentFlag:
    digit += 1

  return digit

# test cases
fct("3", unfrequentFlag=True)

From the bug taxonomy presented earlier, we can say that:

  • case #2 and #3 are code defects that infect the program state,
  • program instruction digit += 1 causes the failure if the state is infected *if the unfrequentFlag is set,
  • assertion type(digit) is int ensures that the state is correct when entering the function body.

In Python, asserts can't be disabled, and that's a problem for me, because I wanted to have the ability:

  • during development, to abort early, that is, as soon as I know the program state is infected
  • in production, to avoid crashes as much as possible.

In case #2, the execution won't crash (without the assert I mean). The state is invalid, but the instruction causing the failure is not executed, so it goes unnoticed, everybody's happy!

Is this reasonment flawed? Did I miss something in duck typing? I guess so, otherwise assertion could be disabled more easily in Python ...

In statically typed languages like Java, such problems are detected even earlier, by the type system of the compilier, but that's not the discussion here! Java also faces that problem I descibed here, for instance with a null object, as assert(obj != null) and later the dereference of the object in question.

I also know that unit testing is the solution, but who writes unit tests for a code that is in no way critical? (I had to write non-regression tests for the patches I contributed to GDB, and along with documentation it can be longer to write than the patch itself, so you must have good motivations to slow down the development by a factor of two! Automatic tools like the compiler parser gives you the first level of guarantee for free!

* I know this is not pythonic code, that's just an example ;-)

How Does a C Debugger Work? (GDB Ptrace/x86 example)

Thursday, November 13, 2014 - 12 comments

When you use GDB, you can see that it has a complete control over your application process. Hit Ctrl-C while the application is running and the process execution stops, and GDB shows its current location, stack trace, etc.

But how can it do it?

How they don't work?

Let's start first with how it doesn't work. It doesn't simulate the execution, by reading and interpreting the binary instructions. It could, and that would work (that the way valgrind memory debugger works), but that would be too slow. Valgrind slows the application 1000x down, GDB doesn't. That's also the way virtual machines like Qemu work.

So, what's the trick? Black magic! ... no, that would be too easy.

Another guess? ... ? Hacking! yes, there's a good deal of that, plus help from the OS kernel.

First of all, there's one thing to know about Linux processes: parent processes can get additional information about their children, in particular the ability to ptrace them. And, you can guess, the debugger is the parent of the debuggee process (or it becomes, processes can adopt a child in Linux :-).

Linux Ptrace API

Linux ptrace API allows a (debugger) process to access low-level information about another process (the debuggee). In particular, the debugger can:

  • read and write the debuggee's memory: PTRACE_PEEKTEXT, PTRACE_PEEKUSER, PTRACE_POKE...
  • read and write the debuggee's CPU registers: PTRACE_GETREGSET, PTRACE_SETREGS,
  • be notified of system events: PTRACE_O_TRACEEXEC, PTRACE_O_TRACECLONE, PTRACE_O_EXITKILL, PTRACE_SYSCALL (you can recognize the exec syscall, clone, exit, and all the other syscalls)
  • control its execution: PTRACE_SINGLESTEP, PTRACE_KILL, PTRACE_INTERRUPT, PTRACE_CONT (notice the CPU single-stepping here)
  • alter its signal handling: PTRACE_GETSIGINFO, PTRACE_SETSIGINFO

How is Ptrace implemented?

Ptrace implementation is outside of the scope of this post, but I don't want to move the black-box one step above, so let me explain quickly how it works (I'm no kernel expert, please correct me if I'm wrong and excuse me if I simplify too much :-).

Ptrace is part of Linux kernel, so it has access to all the kernel-level information about the process:

What about systems without Ptrace?

The explanation above targeted Linux native debugging, but it's valid for most of the other environments. To get a clue on what GDB asks to its different targets, you can take a look at the operations of its target stack.

In this target interface, you can see all of the high-level operations required for C debugging:

struct target_ops 
  struct target_ops *beneath;   /* To the target under this one.  */
  const char *to_shortname; /* Name this target type */
  const char *to_longname;  /* Name for printing */
  const char *to_doc;       /* Documentation.  Does not include trailing
               newline, and starts with a one-line descrip-
               tion (probably similar to to_longname).  */

 void (*to_attach) (struct target_ops *ops, const char *, int);
 void (*to_fetch_registers) (struct target_ops *, struct regcache *, int);
 void (*to_store_registers) (struct target_ops *, struct regcache *, int);
 int (*to_insert_breakpoint) (struct target_ops *, struct gdbarch *,
             struct bp_target_info *);
 int (*to_insert_watchpoint) (struct target_ops *,
             CORE_ADDR, int, int, struct expression *);

The generic part of GDB calls these functions, and the target-specific parts implement them. It is (conceptually) shaped as a stack, or a pyramid: the top of the stack is quite generic, for instance:

The remote target is interesting, as it splits the execution stack between two "computers", through a communication protocol (TCP/IP, serial port).

The remote part can be gdbserver, running in another Linux box. But it can also be an interface to a hardware-debugging port (JTAG) or a virtual machine hypervisor (e.g Qemu), that will play the role of the kernel+ptrace. Instead of querying the OS kernel structures, the remote debugger stub will query the hypervisor structures, or directly the hardware registers of the processor.

For further reading about this remote protocol, Embecosm wrote a detail guide about the different messages. Gdbserver event-processing loop is there, and Qemu gdb-server stub is also online.

To sum up

We can see here that all the low-level mechanisms required to implement a debugger are there, provided by this ptrace API:

  • Catch the exec syscall and block the start of the execution,
  • Query the CPU registers to get the process's current instruction and stack location,
  • Catch for clone/fork events to detect new threads,
  • Peek and poke data addresses to read and alter memory variables.

But is that all a debugger does? no, that just the very low level parts ... It also deals with symbol handling. That's link between the binary code and the program sources. And one thing is still missing, maybe the most important one: breakpoints! I'll first explain how breakpoints work as it's quite interesting and tricky, then I'll come back on symbol management.

Breakpoints are not part of Ptrace API

As we've seen above, breakpoints are not part of ptrace API services. But we can alter the memory, and receive the debugee's signals. You can't see the link? That's because breakpoint implementation is quite tricky and hacky! Let's examine how to set a breakpoint at a given address:

  1. The debugger reads (ptrace peek) the binary instruction stored at this address, and saves it in its data structures.
  2. It writes a trapping instruction at this location. This instruction can be a dedicated debugging instruction (INT3/0xCC on x86, ebreak on RISC-V), or any invalid instruction for the given CPU.
  3. When the debuggee reaches this invalid instruction (or, put more correctly, the CPU, configured with the debuggee memory context), it won't be able to execute it (because it's an invalid instruction), or it will trap in the kernel fault handler.
  4. In modern multitask OSes, an invalid instruction doesn't crash the whole system, but it gives the control back to the OS kernel, by raising an interruption (or a fault).
  5. This interruption is translated by Linux into a SIGTRAP signal, and transmitted to the process ... or to it's parent, as the debugger asked for.
  6. The debugger gets the information about the signal, and checks the value of the debuggee's instruction pointer (i.e., where the trap occurred). If the IP address is in its breakpoint list, that means it's a debugger breakpoint (otherwise, it's a fault in the process, just pass the signal and let it crash).
  7. Now that the debuggee is stopped at the breakpoint, the debugger can let its user do what ever s/he wants, until it's time to continue the execution.
  8. To continue, the debugger needs to 1/ write the correct instruction back in the debuggee's memory, 2/ single-step it (continue the execution for one CPU instruction, with ptrace single-step) and 3/ write the invalid instruction back (so that the execution can stop again next time). And 4/, let the execution flow normally.

Neat, isn't it? As a side remark, you can notice that this algorithm will not work if not all the threads are stopped at the same time (because running threads may pass the breakpoint when the valid instruction is in place). I won't detail the way GDB guys solved it, but it's discussed in detail this paper: Non-stop Multi-threaded Debugging in GDB. Put briefly, they write the instruction somewhere else in memory, set the instruction pointer to that location and single-step the processor. But the problem is that some instruction are address-related, for example the jumps and conditional jumps ...

Symbol and debug information handling

Now, let's come back to the symbol and debug information handling aspect. I didn't study that part into details, so I'll only present an overview.

First of all, can we debug without debug information and symbol addresses? The answer is yes, as, as we've seen above, all the low-level commands deal with CPU registers and memory addresses, and not source-level information. Hence, the link with the sources are only for the user's convenience. Without debug information, you'll see your application the way the processor (and the kernel) see it: as binary (assembly) instructions and memory bits. GDB doesn't need any further information to translate binary data into CPU instructions:

(gdb) x/10x $pc # heXadecimal representation
0x402c60:   0x56415741  0x54415541  0x55f48949  0x4853fd89
0x402c70:   0x03a8ec81  0x8b480000  0x8b48643e  0x00282504
0x402c80:   0x89480000  0x03982484
(gdb) x/10i $pc # Instruction representation
=> 0x402c60:    push   %r15
0x402c62:   push   %r14
0x402c64:   push   %r13
0x402c66:   push   %r12
0x402c68:   mov    %rsi,%r12
0x402c6b:   push   %rbp
0x402c6c:   mov    %edi,%ebp
0x402c6e:   push   %rbx
0x402c6f:   sub    $0x3a8,%rsp
0x402c76:   mov    (%rsi),%rdi

Now if we add symbol handling information, GDB can match addresses with symbol names:

(gdb) $pc
$1 = (void (*)()) 0x402c60 <main>

You can list the symbols of an ELF binary with nm -a $file:

nm -a /usr/lib/debug/usr/bin/ls.debug | grep " main"
0000000000402c60 T main

GDB will also be able to display the stack trace (more on that later), but with a limited interest:

(gdb) where
#0  write ()
#1  0x0000003d492769e3 in _IO_new_file_write ()
#2  0x0000003d49277e4c in new_do_write ()
#3  _IO_new_do_write ()
#4  0x0000003d49278223 in _IO_new_file_overflow ()
#5  0x00000000004085bb in print_current_files ()
#6  0x000000000040431b in main ()

We've got the PC addresses, the corresponding function, but that's it. Inside a function, you'll need to debug in assembly!

Now let's add debug information: that's the DWARF standard, gcc -g option. I'm not very familiar with this standard, but I know it provides:

  • address to line and line to address mapping
  • data type definitions, including typedefs and structures
  • local variables and function parameters, with their type

Try dwarfdump to see the information embedded in you binaries. addr2line also uses these information:

$ dwarfdump /usr/lib/debug/usr/bin/ls.debug | grep 402ce4
0x00402ce4  [1289, 0] NS
$ addr2line -e /usr/lib/debug/usr/bin/ls.debug  0x00402ce4

Many source-level debugging commands will rely on these information, like the command next, that sets a breakpoint at the address of the next line, the print command that relies on the types to display the variables in the right type (char, int, float, instead of binary/hexadecimal!).

Last words

We've seen many aspects of debugger's internals, so I'll just say a few words of the last points:

  • the stack trace is "unwinded" from the current frame ($sp and $bp/#fp) upwards, one frame at a time. Functions' name, parameters and local variables are found in the debug information.
  • watchpoints are implemented (if available) with the help of the processor: write in its registers which addresses should be monitored, and it will raise an exception when the memory is read or written. If this support is not available, or if you request more watchpoints than the processor supports ... then the debugger falls back to "hand-made" watchpoints: execute the application instruction by instruction, and check if the current operation touches a watchpointed address. Yes, that's very slow!
  • Reverse debugging can be done this way too, record the effect of each instruction, and apply it backward for reverse execution.
  • Conditional breakpoints are normal breakpoints, except that, internally, the debugger checks the conditions before giving the control to the user. If the condition is not matched, the execution is silently continued.

And play with gdb gdb, or better (way better actually), gdb --pid $(pidof gdb), because two debuggers in the same terminal is insane :-). Another great thing for learning is system debugging:

qemu-system-i386 -gdb tcp::1234
gdb --pid $(pidof qemu-system-i386)
gdb /boot/vmlinuz --exec "target remote localhost:1234"

but I'll keep that for another article!

Bug(ging) and debugging

Monday, November 10, 2014 - No comments

At the beginning of my PhD, I read two interesting books about debuggers. One by J. Rosenberg, How Debuggers Work: Algorithms, Data Structures, and Architecture which describe the internal algorithms of interactive debuggers; and another by A. Zeller, WHY PROGRAMS FAIL: A Guide to Systematic Debugging that discusses how programmers introduce bugs in their applications. And in particular, in the latter book, Zeller explains what is a bug, through four different steps:

1. The programmer creates a defect. A defect is a piece of the code that can cause an infection. Because the defect is part of the code, and because every code is initially written by a programmer, the defect is technically created by the programmer.

2. The defect causes an infection. The program is executed, and with it the defect. The defect now creates an infection—that is, after execution of the defect, the program state differs from what the programmer intended. A defect in the code does not necessarily cause an infection. The defective code must be executed, and it must be executed under such conditions that the infection actually occurs.

3. The infection propagates. Most functions result in errors when fed with erroneous input. As the remaining program execution accesses the state, it generates further infections that can spread into later program states. An infection need not, however, propagate continuously. It may be overwritten, masked, or corrected by some later program action.

4. The infection causes a failure. A failure is an externally observable error in the program behavior. It is caused by an infection in the program state.

It's important to have these four step in mind when you develop and debug, as although you may have a problem (a defect) in your code, if step 4 (or 2) is never executed it won't be visible in your application .... until the execution takes another code path.

Likewise, it may be easy to 'fix' a failure, but that doesn't mean that you problem is actually resolved.

I just read another though on debugging that I quite like, that compares it with Sherlock Homes' investigation technique:

How do you debug?

> Most people, if you describe a train of events to them, will tell you what the result would be. They can put those events together in their minds, and argue from them that something will come to pass. There are few people, however, who, if you told them a result, would be able to evolve from their own inner consciousness what the steps were which led up to that result. This power is what I mean when I talk of reasoning backwards, or analytically.
>Sherlock Holmes A Study in Scarlet, by Sir Arthur Conan Doyle

Debugging is indeed reasoning backwards, you see consequences, a failure (or a murder), and you investigate on what the causes can be.

I don't agree with his conclusion though,

The Holmes method of debugging is superior, I think, to the scientific method of debugging because debugging isn’t just a science. There’s an art to knowing where to look and what data is needed. This comes from experience and is as much intuition as it is logic. Practice debugging and you will be a better debugger.

as I think that both methods are just complementary. You investigate on what the causes can be, then you make hypothesis and you try to validate or disprove them. The better investigator you are, the easier it will be to formulate hypotheses and prove them right and useful!

Finding a Bug with GDB (and mcGDB)

Saturday, October 25, 2014 - No comments

Yesterday, I had to come back on an OpenCL code I wrote 6 months ago, for a trivial update. After I did my few modification, I ran the code to test it, and it failed.

$ bin/xspecfem3D
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7FFD9B489BC5
#1  0x4CA9DD in prepare_cleanup_device_ at prepare_mesh_constants_gpu.c:2472
#2  0x40473A in xspecfem3d at specfem3D.F90:473
#3  0x7FFD9930BB44
Segmentation fault

First reflex: run it with GDB:

$ gdb bin/xspecfem3D
(gdb) run
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7bd6bc5 in clReleaseMemObject () from /usr/lib/x86_64-linux-gnu/libOpenCL.so.1
(gdb) where
#0  0x00007ffff7bd6bc5 in clReleaseMemObject () from /usr/lib/x86_64-linux-gnu/libOpenCL.so.1
#1  0x00000000004ca9de in prepare_cleanup_device_ () at src/gpu/prepare_mesh_constants_gpu.c:2472
#2  0x000000000040473b in xspecfem3d () at src/specfem3D/specfem3D.F90:473
#3  main () at src/specfem3D/specfem3D.F90:32
#4  0x00007ffff5a58b45 in __libc_start_main () at libc-start.c:287
#5  0x0000000000404780 in _start ()

We can see that the crash is in function clReleaseMemObject, around these lines in the source file:

    if (run_opencl) RELEASE_PINNED_BUFFER_OCL (station_seismo_field); // <---- segfault here
#ifdef USE_CUDA
    if (run_cuda) cudaFreeHost(mp->h_station_seismo_field);

RELEASE_PINNED_BUFFER_OCL is a preprocessor macro function defined like that:

#define RELEASE_PINNED_BUFFER_OCL(_buffer_) \
     clCheck(clEnqueueUnmapMemObject(mocl.command_queue, mp->h_pinned_##_buffer_, \
                                                           mp->h_##_buffer_, 0, NULL, NULL)); \
     clCheck(clReleaseMemObject (mp->h_pinned_##_buffer_))

Okay, so we crash in that second line. What is the value of [c]cl_mem mp->h_pinned_station_seismo_field[c]? (defined there)

(gdb) print mp->h_pinned_station_seismo_field
$1 = (cl_mem) 0x3fd71113a0000000
(gdb) print *mp->h_pinned_station_seismo_field
$2 = <incomplete type>

The pointer appears to be valid, but we don't know much about it. Let's try to set a breakpoint on the function call before the segfault:

(gdb) break clEnqueueUnmapMemObject
(gdb) run
Program received signal SIGSEGV, Segmentation fault.

Humg? we wanted to stop before the segfault, the application was not supposed to crash. Let's examine a bit further the state of the application when it crashed:

   if (run_opencl) RELEASE_PINNED_BUFFER_OCL (station_seismo_field); // <---- segfault here
#ifdef USE_CUDA
   if (run_cuda) cudaFreeHost(mp->h_station_seismo_field);

Both USE_OPENCL and USE_CUDA are defined, I know it. run_opencl should be true as well, let's check that:

(gdb) print run_opencl
$1 = 0

Oh oh, waht? so I'm not in an OpenCL run? (OCL and Cuda are mutually exclusive) Let's make sure that we're in Cuda:

(gdb) print run_cuda
$2 = 1

Alright, that's clear now!

* OpenCL was not supposed to run,
* it crashes in clReleaseMemObject nevertheless,
* breakpoint in the first function of the macro-function didn't work, ...

The if test is not doing what was expected from it! Here is what it really does:

if (run_opencl) clCheck(clEnqueueUnmapMemObject(mocl.command_queue, mp->h_pinned_station_seismo_field, 
                                                           mp->h_station_seismo_field, 0, NULL, NULL)); 
     clCheck(clReleaseMemObject (mp->h_pinned_station_seismo_field));

The second function call is not part of the conditional execution ...

The problem is easy to solve, either by protecting the if:

if (run_opencl) {
    RELEASE_PINNED_BUFFER_OCL (station_seismo_field);

or by protecting the macro:

#define ALLOC_PINNED_BUFFER_OCL(_buffer_, _size_) do { ....} while (0)

I originally didn't protect the macro, because all my ifs are protected (and I didn't paid enough attention at being future-proof). But the code isn't mine, and someone changed the coding convention (and didn't test the OpenCL branch of the code).

In mcGDB, understanding the state of the application would have been easier. Instead of:

(gdb) print mp->h_pinned_station_seismo_field
$1 = (cl_mem) 0x3fd71113a0000000
(gdb) print *mp->h_pinned_station_seismo_field
$2 = <incomplete type>
The pointer appears to be valid, but we don't know much about it.

we could have had:

(mcGDB) opencl info buffer mp->h_pinned_station_seismo_field
No OpenCL buffer at @0x....

and instead of: run_opencl should be true as well, let's check that:

(gdb) print run_opencl
$1 = 0

we would have noticed that no OpenCL event was displayed in the debugger before the crash (like "New kernel created", "New buffer created"), or more explicitely:
(mcgdb) opencl show activity
No OpenCL activity recorded

Callstack from Userland to Kernel-space

Monday, October 20, 2014 - No comments

When I started my PhD 4 years ago, I had the chance to play with a nice tool (but ST internal ...) that does advanced Linux kernel debugging. Among its capabilities, one thing that astonished me was the ability to show a full call stack, from userland down to kernel-space :

                      #0  context_switch ()       at kernel/sched.c:2894
                      #1  schedule ()             at kernel/sched.c:5500
                      #2  do_nanosleep ()         at kernel/hrtimer.c:1494
                      #3  hrtimer_nanosleep ()    at kernel/hrtimer.c:1563
                      #4  sys_nanosleep ()        at kernel/hrtimer.c:1601
                      #5  ret_fast_syscall ()     at arch/arm/kernel/entry-armv.S:744

                      #6  nanosleep ()            at lib/nanosleep.c:51
                      #7  sleep ()                at unix/sysv/linux/sleep.c:138
                      #8  main ()                 at sleep.c:4

That may look trivial at the first sight, but the top part of the stack (#0 - #5) belongs to the kernel (Linux), whereas the bottom part belong to user-space. Frame #8 is the application itself, and #7/#6 the libc.
Generating such a trace is impossible in a standard environment, as the kernel runs in a protected memory area, that is not address the same way as userland.

This particular example comes from a work I did, where I had to port the low-end of the kernel debugger to support ARM processors running on Qemu virtual machine. I studied the __copy_to/from_user() kernel functions to understand how the kernel does to lookup userland addresses from kernel space, and reimplemented the same logic inside the debugger.

As far as I remember, that involved modifying the registers of the machine's MMU, which is the unit in charge of mapping virtual addresses to physical ones. So the debugger had to reprogram it with the memory context of the current process, ask Qemu to convert the address, and reinstall the original process (otherwise ... system crash and kernel panic!).

* Attentive readers may have noticed that, unfortunately, this is not the complete stack, ~half of it is still missing! The callstack doesn't start magically from the
function ... With gdb
set backtrace past-main on
set backtrace past-entry on
we can see what happened before the
(that is, in the
), but we won't be able to get past that today ...

#3  0x00000000004018ed in main () at src/sleep.c:145
#4  0x0000003d49221d65 in __libc_start_main () at libc-start.c:285
#5  0x00000000004019f9 in _start ()