"Unrelated" problems: it only works with my unused variable!
A situation that occurred recently to a colleague of mine:
I don't understand what's the problem, I never use that variable, but if I comment it out, my program crashes! If I let it, it runs fine!
Of course, when you're used C programming and know a bit of memory layout, you already know that "it runs fine" is subjective and that there is buffer overflow somewhere in the code.
Nonetheless, I think the situation is interesting to study, just to remember that this can lead to tricky incomprehensible behaviors.
The problem
Consider this small C code:
#include "stdio.h"
#define SIZE 4
//#define DO_NOT_CRASH
int *i_ptr;
#ifdef DO_NOT_CRASH
int *not_used;
#endif
int array[SIZE];
int main() {
int i;
i_ptr = &i;
for (i = 0; i <= SIZE; i++) {
array[i] = -1;
}
printf("*i_ptr is ...");
fflush(stdout);
printf(" %d\n", *i_ptr);
}
Run it:
gcc test.c -g -O0 && ./a.out \ echo ==============================; \ gcc test.c -g -O0 -DDO_NOT_CRASH && ./a.out &i_ptr is 0x7ffeac4708d0 *i_ptr is ...[1] 8127 segmentation fault (core dumped) ./a.out ============================== &i_ptr is 0x7ffd5056f508 *i_ptr is ... 5
(I must admit that it was harder than expected to reproduce the bug. I first put all the variable on the stack (contrary to my colleague), but did not manage to have a clean and buggy behavior! Certainly memory alignment constraints that I don't understand well.)
Surprise! (or not)
Surprise, an unused variable can trigger a segmentation fault!
Last year I presented the definitions of a bug, according to Andreas Zeller. In this definition, he makes the distinction between
- a defect (an invalid piece of code),
- an infection (the execution of this defect, leading to an invalid memory area)
- the propagation of that infection (the augmentation of the invalid memory are size)
- and the failure, the externally observable error.
Here we've got an illustration of the purpose of these definition: with -DDO_NOT_CRASH, but program doesn't bug, but we now it's bugged. Yep, totally clear :-)
So what we really have is a program with a defect, whose memory space gets infected, but the infection does not propagate enough to lead to a failure.
How to detect it: with Valgrind
valgrind ./a.out ==17545== Memcheck, a memory error detector ==17545== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==17545== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==17545== Command: ./a.out ==17545== *i_ptr is ... 5 ==17545== ==17545== HEAP SUMMARY: ==17545== in use at exit: 0 bytes in 0 blocks ==17545== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==17545== ==17545== All heap blocks were freed -- no leaks are possible ==17545== ==17545== For counts of detected and suppressed errors, rerun with: -v ==17545== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Nop! I expected to see something with valgrind, but apparently it's not illegal enough! (or I missed an option ...?)
How to detect it: with GDB
That that we understand the situation, we know that there is a buffer overflow, but we need to find the infection point!
(gdb) start
(gdb) watch not_used
Hardware watchpoint 2: not_used
(gdb) cont
Continuing.
Hardware watchpoint 2: not_used
Old value = (int *) 0x0
New value = (int *) 0xffffffff
main () at overflow-long.c:18
18 for (i = 0; i <= SIZE; i++) {
19 array[i] = -1;
(gdb) print i
$1 = 4
There we are, for i=4, array[i] overflows into not_used. Our defect contaminates a memory area that is never read, so it never propagated to a failure.
Unexpected behavior of GDB watchpoint
At the beginning of the execution, the value of not_used is 0. In the overflow of the for loop, I set it to -1, so the watchpoint is triggered.
But in the first code I wrote, I set it to 0, and the watchpoint was ... not triggered. That's a bit unexpected to me, a write is a write, so I wanted the watchpoint to trigger!
So, just to confirm, I tried with rwatch, to set a read watchpoint ... and it worked!
(gdb) rwatch not_used
Hardware read watchpoint 2: not_used
(gdb) cont
Continuing.
Hardware read watchpoint 2: not_used
Value = (int *) 0x0
main () at overflow-long.c:18
18 for (i = 0; i <= SIZE; i++) {
(gdb)
That's also surprising to me, as my code is not supposed to read anything at this address!
Just to make it stranger, using rwatch with 0 <- -1 and watch with 0 <- 0 (the reverse of what works) doesn't work! (For the record, it's always the mov instruction that triggers my watchpoints).
(gdb) break *0x972