(gdb) break *0x972

Debugging, GNU± Linux and WebHosting and ... and ...

"Unrelated" problems: it only works with my unused variable!

A situation that occurred recently to a colleague of mine:

I don't understand what's the problem, I never use that variable, but if I comment it out, my program crashes! If I let it, it runs fine!

Of course, when you're used C programming and know a bit of memory layout, you already know that "it runs fine" is subjective and that there is buffer overflow somewhere in the code.

Nonetheless, I think the situation is interesting to study, just to remember that this can lead to tricky incomprehensible behaviors.

The problem

Consider this small C code:

 #include "stdio.h"

 #define SIZE 4

 //#define DO_NOT_CRASH

 int *i_ptr;

 #ifdef DO_NOT_CRASH
 int *not_used;
 #endif
 int array[SIZE];

 int main() {
   int i;
   i_ptr = &i;

   for (i = 0; i <= SIZE; i++) {
     array[i] = -1;
   }

   printf("*i_ptr   is ...");
   fflush(stdout);
   printf(" %d\n", *i_ptr);
 }

Run it:

gcc test.c -g -O0 && ./a.out \
echo ==============================; \
gcc test.c -g -O0 -DDO_NOT_CRASH && ./a.out

&i_ptr   is     0x7ffeac4708d0
*i_ptr   is ...[1]    8127 segmentation fault (core dumped)  ./a.out
==============================
&i_ptr   is     0x7ffd5056f508
*i_ptr   is ... 5

(I must admit that it was harder than expected to reproduce the bug. I first put all the variable on the stack (contrary to my colleague), but did not manage to have a clean and buggy behavior! Certainly memory alignment constraints that I don't understand well.)

Surprise! (or not)

Surprise, an unused variable can trigger a segmentation fault!

Last year I presented the definitions of a bug, according to Andreas Zeller. In this definition, he makes the distinction between

  • a defect (an invalid piece of code),
  • an infection (the execution of this defect, leading to an invalid memory area)
  • the propagation of that infection (the augmentation of the invalid memory are size)
  • and the failure, the externally observable error.

Here we've got an illustration of the purpose of these definition: with -DDO_NOT_CRASH, but program doesn't bug, but we now it's bugged. Yep, totally clear :-)

So what we really have is a program with a defect, whose memory space gets infected, but the infection does not propagate enough to lead to a failure.

How to detect it: with Valgrind

 valgrind ./a.out
 ==17545== Memcheck, a memory error detector
 ==17545== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
 ==17545== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
 ==17545== Command: ./a.out
 ==17545== 
 *i_ptr   is ... 5
 ==17545== 
 ==17545== HEAP SUMMARY:
 ==17545==     in use at exit: 0 bytes in 0 blocks
 ==17545==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
 ==17545== 
 ==17545== All heap blocks were freed -- no leaks are possible
 ==17545== 
 ==17545== For counts of detected and suppressed errors, rerun with: -v
 ==17545== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Nop! I expected to see something with valgrind, but apparently it's not illegal enough! (or I missed an option ...?)

How to detect it: with GDB

That that we understand the situation, we know that there is a buffer overflow, but we need to find the infection point!

(gdb) start
(gdb) watch not_used 
Hardware watchpoint 2: not_used
(gdb) cont
Continuing.
Hardware watchpoint 2: not_used

Old value = (int *) 0x0
New value = (int *) 0xffffffff
main () at overflow-long.c:18
18    for (i = 0; i <= SIZE; i++) {
19      array[i] = -1;
(gdb) print i
$1 = 4

There we are, for i=4, array[i] overflows into not_used. Our defect contaminates a memory area that is never read, so it never propagated to a failure.

Unexpected behavior of GDB watchpoint

At the beginning of the execution, the value of not_used is 0. In the overflow of the for loop, I set it to -1, so the watchpoint is triggered.

But in the first code I wrote, I set it to 0, and the watchpoint was ... not triggered. That's a bit unexpected to me, a write is a write, so I wanted the watchpoint to trigger!

So, just to confirm, I tried with rwatch, to set a read watchpoint ... and it worked!

 (gdb) rwatch not_used 
 Hardware read watchpoint 2: not_used
 (gdb) cont
 Continuing.
 Hardware read watchpoint 2: not_used

 Value = (int *) 0x0
 main () at overflow-long.c:18
 18   for (i = 0; i <= SIZE; i++) {
 (gdb)

That's also surprising to me, as my code is not supposed to read anything at this address!

Just to make it stranger, using rwatch with 0 <- -1 and watch with 0 <- 0 (the reverse of what works) doesn't work! (For the record, it's always the mov instruction that triggers my watchpoints).