"Unrelated" problems: it only works with my unused variable!

A situation that occurred recently to a colleague of mine:

I don't understand what's the problem, I never use that variable, but if I comment it out, my program crashes! If I let it, it runs fine!

Of course, when you're used C programming and know a bit of memory layout, you already know that "it runs fine" is subjective and that there is buffer overflow somewhere in the code.

Nonetheless, I think the situation is interesting to study, just to remember that this can lead to tricky incomprehensible behaviors.

The problem

Consider this small C code:

 #include "stdio.h"

 #define SIZE 4

 //#define DO_NOT_CRASH

 int *i_ptr;

 #ifdef DO_NOT_CRASH
 int *not_used;
 #endif
 int array[SIZE];

 int main() {
   int i;
   i_ptr = &i;

   for (i = 0; i <= SIZE; i++) {
     array[i] = -1;
   }

   printf("*i_ptr   is ...");
   fflush(stdout);
   printf(" %d\n", *i_ptr);
 }

Run it:

gcc test.c -g -O0 && ./a.out \
echo ==============================; \
gcc test.c -g -O0 -DDO_NOT_CRASH && ./a.out

&i_ptr   is     0x7ffeac4708d0
*i_ptr   is ...[1]    8127 segmentation fault (core dumped)  ./a.out
==============================
&i_ptr   is     0x7ffd5056f508
*i_ptr   is ... 5

(I must admit that it was harder than expected to reproduce the bug. I first put all the variable on the stack (contrary to my colleague), but did not manage to have a clean and buggy behavior! Certainly memory alignment constraints that I don't understand well.)

Surprise! (or not)

Surprise, an unused variable can trigger a segmentation fault!

Last year I presented the definitions of a bug, according to Andreas Zeller. In this definition, he makes the distinction between

a defect (an invalid piece of code),
an infection (the execution of this defect, leading to an invalid memory area)
the propagation of that infection (the augmentation of the invalid memory are size)
and the failure, the externally observable error.

Here we've got an illustration of the purpose of these definition: with -DDO_NOT_CRASH, but program doesn't bug, but we now it's bugged. Yep, totally clear :-)

So what we really have is a program with a defect, whose memory space gets infected, but the infection does not propagate enough to lead to a failure.

How to detect it: with Valgrind

 valgrind ./a.out
 ==17545== Memcheck, a memory error detector
 ==17545== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
 ==17545== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
 ==17545== Command: ./a.out
 ==17545== 
 *i_ptr   is ... 5
 ==17545== 
 ==17545== HEAP SUMMARY:
 ==17545==     in use at exit: 0 bytes in 0 blocks
 ==17545==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
 ==17545== 
 ==17545== All heap blocks were freed -- no leaks are possible
 ==17545== 
 ==17545== For counts of detected and suppressed errors, rerun with: -v
 ==17545== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Nop! I expected to see something with valgrind, but apparently it's not illegal enough! (or I missed an option ...?)

How to detect it: with GDB

That that we understand the situation, we know that there is a buffer overflow, but we need to find the infection point!

(gdb) start
(gdb) watch not_used 
Hardware watchpoint 2: not_used
(gdb) cont
Continuing.
Hardware watchpoint 2: not_used

Old value = (int *) 0x0
New value = (int *) 0xffffffff
main () at overflow-long.c:18
18    for (i = 0; i <= SIZE; i++) {
19      array[i] = -1;
(gdb) print i
$1 = 4

There we are, for i=4, array[i] overflows into not_used. Our defect contaminates a memory area that is never read, so it never propagated to a failure.

Unexpected behavior of GDB watchpoint

At the beginning of the execution, the value of not_used is 0. In the overflow of the for loop, I set it to -1, so the watchpoint is triggered.

But in the first code I wrote, I set it to 0, and the watchpoint was ... not triggered. That's a bit unexpected to me, a write is a write, so I wanted the watchpoint to trigger!

So, just to confirm, I tried with rwatch, to set a read watchpoint ... and it worked!

 (gdb) rwatch not_used 
 Hardware read watchpoint 2: not_used
 (gdb) cont
 Continuing.
 Hardware read watchpoint 2: not_used

 Value = (int *) 0x0
 main () at overflow-long.c:18
 18   for (i = 0; i <= SIZE; i++) {
 (gdb)

That's also surprising to me, as my code is not supposed to read anything at this address!

Just to make it stranger, using rwatch with 0 <- -1 and watch with 0 <- 0 (the reverse of what works) doesn't work! (For the record, it's always the mov instruction that triggers my watchpoints).

(gdb) break *0x972

Debugging, GNU± Linux and WebHosting and ... and ...