"Unrelated" problems: it only works with my unused variable!
A situation that occurred recently to a colleague of mine:
I don't understand what's the problem, I never use that variable, but if I comment it out, my program crashes! If I let it, it runs fine!
Of course, when you're used C programming and know a bit of memory layout, you already know that "it runs fine" is subjective and that there is buffer overflow somewhere in the code.
Nonetheless, I think the situation is interesting to study, just to remember that this can lead to tricky incomprehensible behaviors.
The problem
Consider this small C code:
#include "stdio.h" #define SIZE 4 //#define DO_NOT_CRASH int *i_ptr; #ifdef DO_NOT_CRASH int *not_used; #endif int array[SIZE]; int main() { int i; i_ptr = &i; for (i = 0; i <= SIZE; i++) { array[i] = -1; } printf("*i_ptr is ..."); fflush(stdout); printf(" %d\n", *i_ptr); }
Run it:
gcc test.c -g -O0 && ./a.out \ echo ==============================; \ gcc test.c -g -O0 -DDO_NOT_CRASH && ./a.out &i_ptr is 0x7ffeac4708d0 *i_ptr is ...[1] 8127 segmentation fault (core dumped) ./a.out ============================== &i_ptr is 0x7ffd5056f508 *i_ptr is ... 5
(I must admit that it was harder than expected to reproduce the bug. I first put all the variable on the stack (contrary to my colleague), but did not manage to have a clean and buggy behavior! Certainly memory alignment constraints that I don't understand well.)
Surprise! (or not)
Surprise, an unused variable can trigger a segmentation fault!
Last year I presented the definitions of a bug, according to Andreas Zeller. In this definition, he makes the distinction between
- a defect (an invalid piece of code),
- an infection (the execution of this defect, leading to an invalid memory area)
- the propagation of that infection (the augmentation of the invalid memory are size)
- and the failure, the externally observable error.
Here we've got an illustration of the purpose of these definition: with -DDO_NOT_CRASH, but program doesn't bug, but we now it's bugged. Yep, totally clear :-)
So what we really have is a program with a defect, whose memory space gets infected, but the infection does not propagate enough to lead to a failure.
How to detect it: with Valgrind
valgrind ./a.out ==17545== Memcheck, a memory error detector ==17545== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==17545== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==17545== Command: ./a.out ==17545== *i_ptr is ... 5 ==17545== ==17545== HEAP SUMMARY: ==17545== in use at exit: 0 bytes in 0 blocks ==17545== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==17545== ==17545== All heap blocks were freed -- no leaks are possible ==17545== ==17545== For counts of detected and suppressed errors, rerun with: -v ==17545== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Nop! I expected to see something with valgrind, but apparently it's not illegal enough! (or I missed an option ...?)
How to detect it: with GDB
That that we understand the situation, we know that there is a buffer overflow, but we need to find the infection point!
(gdb) start (gdb) watch not_used Hardware watchpoint 2: not_used (gdb) cont Continuing. Hardware watchpoint 2: not_used Old value = (int *) 0x0 New value = (int *) 0xffffffff main () at overflow-long.c:18 18 for (i = 0; i <= SIZE; i++) { 19 array[i] = -1; (gdb) print i $1 = 4
There we are, for i=4, array[i] overflows into not_used. Our defect contaminates a memory area that is never read, so it never propagated to a failure.
Unexpected behavior of GDB watchpoint
At the beginning of the execution, the value of not_used is 0. In the overflow of the for loop, I set it to -1, so the watchpoint is triggered.
But in the first code I wrote, I set it to 0, and the watchpoint was ... not triggered. That's a bit unexpected to me, a write is a write, so I wanted the watchpoint to trigger!
So, just to confirm, I tried with rwatch, to set a read watchpoint ... and it worked!
(gdb) rwatch not_used Hardware read watchpoint 2: not_used (gdb) cont Continuing. Hardware read watchpoint 2: not_used Value = (int *) 0x0 main () at overflow-long.c:18 18 for (i = 0; i <= SIZE; i++) { (gdb)
That's also surprising to me, as my code is not supposed to read anything at this address!
Just to make it stranger, using rwatch with 0 <- -1 and watch with 0 <- 0 (the reverse of what works) doesn't work! (For the record, it's always the mov instruction that triggers my watchpoints).