Arrays that are local function variables are allocated on the stack. When a programming error results in the program writing to an array index beyond the size of that array, a stack buffer overflow happens.
Using a file editor, save the contents of the program shown below into a file named stack-buffer-overflow.c
:
#include <string.h>
__attribute__((noinline))
char f(char *src) {
char buffer[8];
strcpy(buffer, src);
return buffer[2];
}
// Note that a 0 byte is automatically appended.
// 00000000011111111112222
// 12345678901234567890123
char many_chars[24] = "These are 24 chars. yes";
int main() {
f(many_chars);
return 0;
}
Compile it with the following command at your docker prompt:
clang -g -O1 stack-buffer-overflow.c -o stack-buffer-overflow
Now, run this program:
./stack-buffer-overflow
Bus error
The program crashed with a bus error. What happened exactly?
Let’s use the gdb debugger to figure out why the program crashed with a bus error. Start the gdb debugger:
gdb -q ./stack-buffer-overflow
Reading symbols from ./stack-buffer-overflow...
(gdb)
Put a breakpoint on the first instruction of function f
. If you simply
execute break f
at the gdb prompt, it will set the breakpoint after the function prologue.
You don’t want to do that, as you need to investigate what happens during the function prologue.
To set a breakpoint, perform the following steps at the gdb prompt and then run:
(gdb) break main
__output__Breakpoint 1 at 0x784: file stack-buffer-overflow.c, line 15.
(gdb) run
__output__Starting program: /armlearningpaths/stack-buffer-overflow
__output__[Thread debugging using libthread_db enabled]
__output__Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
__output__
__output__Breakpoint 1, main () at stack-buffer-overflow.c:15
__output__15 f(many_chars);
Look at the disassembly of function f
:
(gdb) disass f
__output__Dump of assembler code for function f:
__output__ 0x0000aaaaaaaa0754 <+0>: sub sp, sp, #0x20
__output__ 0x0000aaaaaaaa0758 <+4>: stp x29, x30, [sp, #16]
__output__ 0x0000aaaaaaaa075c <+8>: add x29, sp, #0x10
__output__ 0x0000aaaaaaaa0760 <+12>: mov x1, x0
__output__ 0x0000aaaaaaaa0764 <+16>: add x0, sp, #0x8
__output__ 0x0000aaaaaaaa0768 <+20>: bl 0xaaaaaaaa0630 <strcpy@plt>
__output__ 0x0000aaaaaaaa076c <+24>: ldp x29, x30, [sp, #16]
__output__ 0x0000aaaaaaaa0770 <+28>: ldrb w0, [sp, #10]
__output__ 0x0000aaaaaaaa0774 <+32>: add sp, sp, #0x20
__output__ 0x0000aaaaaaaa0778 <+36>: ret
__output__End of assembler dump.
This shows that the first instruction of function f
is located at address
0x0000aaaaaaaa0754
. Now you can explicitly put a breakpoint there:
(gdb) break *0x0000aaaaaaaa0754
__output__Breakpoint 2 at 0xaaaaaaaa0754: file stack-buffer-overflow.c, line 4.
Continue running the program so it runs until the start of function
f
, where it should hit the breakpoint on the first instruction.
(gdb) cont
__output__Continuing.
__output__
__output__Breakpoint 2, f (src=0xaaaaaaab1038 <many_chars> "These are 24 chars. yes") at stack-buffer-overflow.c:4
__output__4 char f(char *src) {
(gdb) disass
__output__Dump of assembler code for function f:
__output__=> 0x0000aaaaaaaa0754 <+0>: sub sp, sp, #0x20
__output__ 0x0000aaaaaaaa0758 <+4>: stp x29, x30, [sp, #16]
__output__ 0x0000aaaaaaaa075c <+8>: add x29, sp, #0x10
__output__ 0x0000aaaaaaaa0760 <+12>: mov x1, x0
__output__ 0x0000aaaaaaaa0764 <+16>: add x0, sp, #0x8
__output__ 0x0000aaaaaaaa0768 <+20>: bl 0xaaaaaaaa0630 <strcpy@plt>
__output__ 0x0000aaaaaaaa076c <+24>: ldp x29, x30, [sp, #16]
__output__ 0x0000aaaaaaaa0770 <+28>: ldrb w0, [sp, #10]
__output__ 0x0000aaaaaaaa0774 <+32>: add sp, sp, #0x20
__output__ 0x0000aaaaaaaa0778 <+36>: ret
__output__End of assembler dump.
You can what the value of the key register sp
, x29
and x30
is
before any instructions are executed in function f
.
(gdb) info registers sp x29 x30
__output__sp 0xfffffffff5a0 0xfffffffff5a0
__output__x29 0xfffffffff5a0 281474976708000
__output__x30 0xaaaaaaaa0790 187649984432016
Now, step through the 3 instructions in the function prologue:
(gdb) nexti 3
__output__0x0000aaaaaaaa0760 in f (src=0xaaaaaaab1038 <many_chars> "These are 24 chars. yes") at stack-buffer-overflow.c:4
__output__4 char f(char *src) {
(gdb) disass
__output__Dump of assembler code for function f:
__output__ 0x0000aaaaaaaa0754 <+0>: sub sp, sp, #0x20
__output__ 0x0000aaaaaaaa0758 <+4>: stp x29, x30, [sp, #16]
__output__ 0x0000aaaaaaaa075c <+8>: add x29, sp, #0x10
__output__=> 0x0000aaaaaaaa0760 <+12>: mov x1, x0
__output__ 0x0000aaaaaaaa0764 <+16>: add x0, sp, #0x8
__output__ 0x0000aaaaaaaa0768 <+20>: bl 0xaaaaaaaa0630 <strcpy@plt>
__output__ 0x0000aaaaaaaa076c <+24>: ldp x29, x30, [sp, #16]
__output__ 0x0000aaaaaaaa0770 <+28>: ldrb w0, [sp, #10]
__output__ 0x0000aaaaaaaa0774 <+32>: add sp, sp, #0x20
__output__ 0x0000aaaaaaaa0778 <+36>: ret
__output__End of assembler dump.
What have the 3 instructions in the prologue done?
sub sp, sp, #0x20
moves the stack pointer downwards by 0x20
= 32 bytes. Looking at the source code and disassembly of function f
, you expect that 2
times 8 bytes = 16 bytes are needed to store x29
and x30
. Furthermore
array buffer
is 8 bytes long. So, in total 24 bytes should be needed on the
frame for function f
.
Why does the code reserve 32 bytes then? The AArch64 PCS ABI specifies that the stack pointer must always be aligned to a 16-byte boundary, so the compiler has no choice but to round up the minimum of 24 bytes to the next higher 16-byte boundary, which is 32.
stp x29, x30, [sp, #16]
stores the value of x29
and x30
on the stack as they will get clobbered
by the call to strcpy
.
add x29, sp, #0x10
sets the frame pointer (always in register x29
) for the frame of function
f
to be 16 bytes (0x10
) higher than the stack pointer.
Draw the frame layout of function f
at this point, including where the stack
pointer and the frame pointer point to.
The answer to this exercise can be found in the Answers section .
Let’s print the content of the frame f
at this point. The frame is 32 bytes long.
Let’s print it as if it were 4 64-bit values:
(gdb) x/4gx $sp
__output__0xfffffffff580: 0x0000aaaaaaab0dd0 0x0000fffff7ffe040
__output__0xfffffffff590: 0x0000fffffffff5a0 0x0000aaaaaaaa0790
That looks like expected: the unused part and variable buffer
seems to contain
arbitrary bits at this point.
Then the old value of x29
is 0x0000fffffffff5a0
matches the value printed by gdb
when you printed the value at the start of the function. Similarly, the value of
x30 on the stack also matches with what was printed at the start of the
function.
In the stack layout you have drawn in exercise 3, you see that 8 bytes have been
allocated on the stack for buffer
. Immediately after that, the old
values that need to be restored at the end of the function of x29
(the frame
pointer) and x30
(the return address) are stored.
If you look again at the source code at the start of this section, you can see that
strcpy
is going to write a string of 24 bytes in buffer
. Now that you know the
stack layout, it is clear that the old values of x29
and x30
on the stack will
be overwritten with bytes from the copied string. Let’s see if you can see that
using the debugger.
(gdb) nexti
__output__6 strcpy(buffer, src);
(gdb) nexti
__output__0x0000aaaaaaaa0768 6 strcpy(buffer, src);
(gdb) disass
__output__Dump of assembler code for function f:
__output__ 0x0000aaaaaaaa0754 <+0>: sub sp, sp, #0x20
__output__ 0x0000aaaaaaaa0758 <+4>: stp x29, x30, [sp, #16]
__output__ 0x0000aaaaaaaa075c <+8>: add x29, sp, #0x10
__output__ 0x0000aaaaaaaa0760 <+12>: mov x1, x0
__output__ 0x0000aaaaaaaa0764 <+16>: add x0, sp, #0x8
__output__=> 0x0000aaaaaaaa0768 <+20>: bl 0xaaaaaaaa0630 <strcpy@plt>
__output__ 0x0000aaaaaaaa076c <+24>: ldp x29, x30, [sp, #16]
__output__ 0x0000aaaaaaaa0770 <+28>: ldrb w0, [sp, #10]
__output__ 0x0000aaaaaaaa0774 <+32>: add sp, sp, #0x20
__output__ 0x0000aaaaaaaa0778 <+36>: ret
__output__End of assembler dump.
You are now right before the call to the strcpy
function. Let it
execute. After that you can check if the values of the saved x29
and
x30
registers on the stack get overwritten.
(gdb) nexti
__output__7 return buffer[2];
(gdb) disass
__output__Dump of assembler code for function f:
__output__ 0x0000aaaaaaaa0754 <+0>: sub sp, sp, #0x20
__output__ 0x0000aaaaaaaa0758 <+4>: stp x29, x30, [sp, #16]
__output__ 0x0000aaaaaaaa075c <+8>: add x29, sp, #0x10
__output__ 0x0000aaaaaaaa0760 <+12>: mov x1, x0
__output__ 0x0000aaaaaaaa0764 <+16>: add x0, sp, #0x8
__output__ 0x0000aaaaaaaa0768 <+20>: bl 0xaaaaaaaa0630 <strcpy@plt>
__output__=> 0x0000aaaaaaaa076c <+24>: ldp x29, x30, [sp, #16]
__output__ 0x0000aaaaaaaa0770 <+28>: ldrb w0, [sp, #10]
__output__ 0x0000aaaaaaaa0774 <+32>: add sp, sp, #0x20
__output__ 0x0000aaaaaaaa0778 <+36>: ret
__output__End of assembler dump.
(gdb) x/4gx $sp
__output__0xfffffffff580: 0x0000aaaaaaab0dd0 0x7261206573656854
__output__0xfffffffff590: 0x6168632034322065 0x00736579202e7372
Indeed, the values of where x29
and x30
are stored have changed! Let’s print
out this same memory, interpreted as ASCII characters to see if those
hexadecimal numbers correspond to the expected string in the source program.
(gdb) x/32c $sp
__output__0xfffffffff580: -48 '\320' 13 '\r' -85 '\253' -86 '\252' -86 '\252' -86 '\252' 0 '\000' 0 '\000'
__output__0xfffffffff588: 84 'T' 104 'h' 101 'e' 115 's' 101 'e' 32 ' ' 97 'a' 114 'r'
__output__0xfffffffff590: 101 'e' 32 ' ' 50 '2' 52 '4' 32 ' ' 99 'c' 104 'h' 97 'a'
__output__0xfffffffff598: 114 'r' 115 's' 46 '.' 32 ' ' 121 'y' 101 'e' 115 's' 0 '\000'
Indeed!
The next instruction that is executed, ldp x29, x30, [sp, #16]
, will
restore the old values of x29
and x30
from the stack.
(gdb) nexti
__output__0x0000aaaaaaaa0770 in f (src=<optimized out>) at stack-buffer-overflow.c:7
__output__7 return buffer[2];
(gdb) disass
__output__Dump of assembler code for function f:
__output__ 0x0000aaaaaaaa0754 <+0>: sub sp, sp, #0x20
__output__ 0x0000aaaaaaaa0758 <+4>: stp x29, x30, [sp, #16]
__output__ 0x0000aaaaaaaa075c <+8>: add x29, sp, #0x10
__output__ 0x0000aaaaaaaa0760 <+12>: mov x1, x0
__output__ 0x0000aaaaaaaa0764 <+16>: add x0, sp, #0x8
__output__ 0x0000aaaaaaaa0768 <+20>: bl 0xaaaaaaaa0630 <strcpy@plt>
__output__ 0x0000aaaaaaaa076c <+24>: ldp x29, x30, [sp, #16]
__output__=> 0x0000aaaaaaaa0770 <+28>: ldrb w0, [sp, #10]
__output__ 0x0000aaaaaaaa0774 <+32>: add sp, sp, #0x20
__output__ 0x0000aaaaaaaa0778 <+36>: ret
__output__End of assembler dump.
(gdb) info registers sp x29 x30
__output__sp 0xfffffffff580 0xfffffffff580
__output__x29 0x6168632034322065 7018969009222721637
__output__x30 0x736579202e7372 32481193227088754
Indeed, registers x29
and x30
do not contain the value that they had at the
start of the function. They were supposed to have those values by now though.
The return ret
instruction at the end of the function uses the value in register
x30
as the address to jump to. Let’s see what happens when you execute it.
(gdb) nexti
__output__0x0000aaaaaaaa0774 7 return buffer[2];
(gdb) nexti
__output__0x0000aaaaaaaa0778 7 return buffer[2];
(gdb) disass
__output__Dump of assembler code for function f:
__output__ 0x0000aaaaaaaa0754 <+0>: sub sp, sp, #0x20
__output__ 0x0000aaaaaaaa0758 <+4>: stp x29, x30, [sp, #16]
__output__ 0x0000aaaaaaaa075c <+8>: add x29, sp, #0x10
__output__ 0x0000aaaaaaaa0760 <+12>: mov x1, x0
__output__ 0x0000aaaaaaaa0764 <+16>: add x0, sp, #0x8
__output__ 0x0000aaaaaaaa0768 <+20>: bl 0xaaaaaaaa0630 <strcpy@plt>
__output__ 0x0000aaaaaaaa076c <+24>: ldp x29, x30, [sp, #16]
__output__ 0x0000aaaaaaaa0770 <+28>: ldrb w0, [sp, #10]
__output__ 0x0000aaaaaaaa0774 <+32>: add sp, sp, #0x20
__output__=> 0x0000aaaaaaaa0778 <+36>: ret
__output__End of assembler dump.
(gdb) nexti
__output__0x00736579202e7372 in ?? ()
(gdb) disass
__output__No function contains program counter for selected frame.
(gdb) nexti
__output__Cannot access memory at address 0x736579202e7372
The program counter now points to 0x736579202e7372
. It is not a valid
address. It is not even a 4-byte aligned value, which the program counter always
should be. That results in a “Bus error” signal when you continue to run the program:
(gdb) cont
__output__Continuing.
__output__
__output__Program received signal SIGBUS, Bus error.
__output__0x00736579202e7372 in ?? ()