Arrays that are local function variables are allocated on the stack. When a programming error results in the program writing to an array index beyond the size of that array, a stack buffer overflow happens.

Using a file editor, save the contents of the program shown below into a file named stack-buffer-overflow.c:

    

        
        
            #include <string.h>

__attribute__((noinline))
char f(char *src) {
    char buffer[8];
    strcpy(buffer, src);
    return buffer[2];
}

// Note that a 0 byte is automatically appended.
//                     00000000011111111112222
//                     12345678901234567890123
char many_chars[24] = "These are 24 chars. yes";
int main() {
    f(many_chars);
    return 0;
}
        
    

Compile it with the following command at your docker prompt:

    

        
        clang -g -O1 stack-buffer-overflow.c -o stack-buffer-overflow

        
    

Now, run this program:

    

        
        ./stack-buffer-overflow
Bus error

        
    

The program crashed with a bus error. What happened exactly?

Deep dive using a debugger

Let’s use the gdb debugger to figure out why the program crashed with a bus error. Start the gdb debugger:

    

        
        gdb -q ./stack-buffer-overflow
Reading symbols from ./stack-buffer-overflow...
(gdb) 

        
    

Put a breakpoint on the first instruction of function f. If you simply execute break f at the gdb prompt, it will set the breakpoint after the function prologue. You don’t want to do that, as you need to investigate what happens during the function prologue. To set a breakpoint, perform the following steps at the gdb prompt and then run:

    

        
        (gdb) break main
__output__Breakpoint 1 at 0x784: file stack-buffer-overflow.c, line 15.

        
    
    

        
        (gdb) run
__output__Starting program: /armlearningpaths/stack-buffer-overflow
__output__[Thread debugging using libthread_db enabled]
__output__Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
__output__
__output__Breakpoint 1, main () at stack-buffer-overflow.c:15
__output__15          f(many_chars);

        
    

Look at the disassembly of function f:

    

        
        (gdb) disass f
__output__Dump of assembler code for function f:
__output__   0x0000aaaaaaaa0754 <+0>:     sub     sp, sp, #0x20
__output__   0x0000aaaaaaaa0758 <+4>:     stp     x29, x30, [sp, #16]
__output__   0x0000aaaaaaaa075c <+8>:     add     x29, sp, #0x10
__output__   0x0000aaaaaaaa0760 <+12>:    mov     x1, x0
__output__   0x0000aaaaaaaa0764 <+16>:    add     x0, sp, #0x8
__output__   0x0000aaaaaaaa0768 <+20>:    bl      0xaaaaaaaa0630 <strcpy@plt>
__output__   0x0000aaaaaaaa076c <+24>:    ldp     x29, x30, [sp, #16]
__output__   0x0000aaaaaaaa0770 <+28>:    ldrb    w0, [sp, #10]
__output__   0x0000aaaaaaaa0774 <+32>:    add     sp, sp, #0x20
__output__   0x0000aaaaaaaa0778 <+36>:    ret
__output__End of assembler dump.

        
    

This shows that the first instruction of function f is located at address 0x0000aaaaaaaa0754. Now you can explicitly put a breakpoint there:

    

        
        (gdb) break *0x0000aaaaaaaa0754
__output__Breakpoint 2 at 0xaaaaaaaa0754: file stack-buffer-overflow.c, line 4.

        
    

Continue running the program so it runs until the start of function f, where it should hit the breakpoint on the first instruction.

    

        
        (gdb) cont
__output__Continuing.
__output__
__output__Breakpoint 2, f (src=0xaaaaaaab1038 <many_chars> "These are 24 chars. yes") at stack-buffer-overflow.c:4
__output__4       char f(char *src) {

        
    
    

        
        (gdb) disass
__output__Dump of assembler code for function f:
__output__=> 0x0000aaaaaaaa0754 <+0>:     sub     sp, sp, #0x20
__output__   0x0000aaaaaaaa0758 <+4>:     stp     x29, x30, [sp, #16]
__output__   0x0000aaaaaaaa075c <+8>:     add     x29, sp, #0x10
__output__   0x0000aaaaaaaa0760 <+12>:    mov     x1, x0
__output__   0x0000aaaaaaaa0764 <+16>:    add     x0, sp, #0x8
__output__   0x0000aaaaaaaa0768 <+20>:    bl      0xaaaaaaaa0630 <strcpy@plt>
__output__   0x0000aaaaaaaa076c <+24>:    ldp     x29, x30, [sp, #16]
__output__   0x0000aaaaaaaa0770 <+28>:    ldrb    w0, [sp, #10]
__output__   0x0000aaaaaaaa0774 <+32>:    add     sp, sp, #0x20
__output__   0x0000aaaaaaaa0778 <+36>:    ret
__output__End of assembler dump.

        
    

You can what the value of the key register sp, x29 and x30 is before any instructions are executed in function f.

    

        
        (gdb) info registers sp x29 x30
__output__sp             0xfffffffff5a0      0xfffffffff5a0
__output__x29            0xfffffffff5a0      281474976708000
__output__x30            0xaaaaaaaa0790      187649984432016

        
    

Now, step through the 3 instructions in the function prologue:

    

        
        (gdb) nexti 3
__output__0x0000aaaaaaaa0760 in f (src=0xaaaaaaab1038 <many_chars> "These are 24 chars. yes") at stack-buffer-overflow.c:4
__output__4       char f(char *src) {

        
    
    

        
        (gdb) disass
__output__Dump of assembler code for function f:
__output__   0x0000aaaaaaaa0754 <+0>:     sub     sp, sp, #0x20
__output__   0x0000aaaaaaaa0758 <+4>:     stp     x29, x30, [sp, #16]
__output__   0x0000aaaaaaaa075c <+8>:     add     x29, sp, #0x10
__output__=> 0x0000aaaaaaaa0760 <+12>:    mov     x1, x0
__output__   0x0000aaaaaaaa0764 <+16>:    add     x0, sp, #0x8
__output__   0x0000aaaaaaaa0768 <+20>:    bl      0xaaaaaaaa0630 <strcpy@plt>
__output__   0x0000aaaaaaaa076c <+24>:    ldp     x29, x30, [sp, #16]
__output__   0x0000aaaaaaaa0770 <+28>:    ldrb    w0, [sp, #10]
__output__   0x0000aaaaaaaa0774 <+32>:    add     sp, sp, #0x20
__output__   0x0000aaaaaaaa0778 <+36>:    ret
__output__End of assembler dump.

        
    

What have the 3 instructions in the prologue done?

  1. sub sp, sp, #0x20

    moves the stack pointer downwards by 0x20 = 32 bytes. Looking at the source code and disassembly of function f, you expect that 2 times 8 bytes = 16 bytes are needed to store x29 and x30. Furthermore array buffer is 8 bytes long. So, in total 24 bytes should be needed on the frame for function f.

    Why does the code reserve 32 bytes then? The AArch64 PCS ABI specifies that the stack pointer must always be aligned to a 16-byte boundary, so the compiler has no choice but to round up the minimum of 24 bytes to the next higher 16-byte boundary, which is 32.

  2. stp x29, x30, [sp, #16]

    stores the value of x29 and x30 on the stack as they will get clobbered by the call to strcpy.

  3. add x29, sp, #0x10

    sets the frame pointer (always in register x29) for the frame of function f to be 16 bytes (0x10) higher than the stack pointer.

Exercise 3

Draw the frame layout of function f at this point, including where the stack pointer and the frame pointer point to.

The answer to this exercise can be found in the Answers section .

Let’s print the content of the frame f at this point. The frame is 32 bytes long. Let’s print it as if it were 4 64-bit values:

    

        
        (gdb) x/4gx $sp
__output__0xfffffffff580: 0x0000aaaaaaab0dd0      0x0000fffff7ffe040
__output__0xfffffffff590: 0x0000fffffffff5a0      0x0000aaaaaaaa0790

        
    

That looks like expected: the unused part and variable buffer seems to contain arbitrary bits at this point.

Then the old value of x29 is 0x0000fffffffff5a0 matches the value printed by gdb when you printed the value at the start of the function. Similarly, the value of x30 on the stack also matches with what was printed at the start of the function.

strcpy writing beyond the bounds of the buffer

In the stack layout you have drawn in exercise 3, you see that 8 bytes have been allocated on the stack for buffer. Immediately after that, the old values that need to be restored at the end of the function of x29 (the frame pointer) and x30 (the return address) are stored.

If you look again at the source code at the start of this section, you can see that strcpy is going to write a string of 24 bytes in buffer. Now that you know the stack layout, it is clear that the old values of x29 and x30 on the stack will be overwritten with bytes from the copied string. Let’s see if you can see that using the debugger.

    

        
        (gdb) nexti
__output__6           strcpy(buffer, src);

        
    
    

        
        (gdb) nexti
__output__0x0000aaaaaaaa0768      6           strcpy(buffer, src);

        
    
    

        
        (gdb) disass
__output__Dump of assembler code for function f:
__output__   0x0000aaaaaaaa0754 <+0>:     sub     sp, sp, #0x20
__output__   0x0000aaaaaaaa0758 <+4>:     stp     x29, x30, [sp, #16]
__output__   0x0000aaaaaaaa075c <+8>:     add     x29, sp, #0x10
__output__   0x0000aaaaaaaa0760 <+12>:    mov     x1, x0
__output__   0x0000aaaaaaaa0764 <+16>:    add     x0, sp, #0x8
__output__=> 0x0000aaaaaaaa0768 <+20>:    bl      0xaaaaaaaa0630 <strcpy@plt>
__output__   0x0000aaaaaaaa076c <+24>:    ldp     x29, x30, [sp, #16]
__output__   0x0000aaaaaaaa0770 <+28>:    ldrb    w0, [sp, #10]
__output__   0x0000aaaaaaaa0774 <+32>:    add     sp, sp, #0x20
__output__   0x0000aaaaaaaa0778 <+36>:    ret
__output__End of assembler dump.

        
    

You are now right before the call to the strcpy function. Let it execute. After that you can check if the values of the saved x29 and x30 registers on the stack get overwritten.

    

        
        (gdb) nexti
__output__7           return buffer[2];

        
    
    

        
        (gdb) disass
__output__Dump of assembler code for function f:
__output__   0x0000aaaaaaaa0754 <+0>:     sub     sp, sp, #0x20
__output__   0x0000aaaaaaaa0758 <+4>:     stp     x29, x30, [sp, #16]
__output__   0x0000aaaaaaaa075c <+8>:     add     x29, sp, #0x10
__output__   0x0000aaaaaaaa0760 <+12>:    mov     x1, x0
__output__   0x0000aaaaaaaa0764 <+16>:    add     x0, sp, #0x8
__output__   0x0000aaaaaaaa0768 <+20>:    bl      0xaaaaaaaa0630 <strcpy@plt>
__output__=> 0x0000aaaaaaaa076c <+24>:    ldp     x29, x30, [sp, #16]
__output__   0x0000aaaaaaaa0770 <+28>:    ldrb    w0, [sp, #10]
__output__   0x0000aaaaaaaa0774 <+32>:    add     sp, sp, #0x20
__output__   0x0000aaaaaaaa0778 <+36>:    ret
__output__End of assembler dump.

        
    
    

        
        (gdb) x/4gx $sp
__output__0xfffffffff580: 0x0000aaaaaaab0dd0      0x7261206573656854
__output__0xfffffffff590: 0x6168632034322065      0x00736579202e7372

        
    

Indeed, the values of where x29 and x30 are stored have changed! Let’s print out this same memory, interpreted as ASCII characters to see if those hexadecimal numbers correspond to the expected string in the source program.

    

        
        (gdb) x/32c $sp
__output__0xfffffffff580: -48 '\320'      13 '\r' -85 '\253'      -86 '\252'      -86 '\252'      -86 '\252'      0 '\000'        0 '\000'
__output__0xfffffffff588: 84 'T'  104 'h' 101 'e' 115 's' 101 'e' 32 ' '  97 'a'  114 'r'
__output__0xfffffffff590: 101 'e' 32 ' '  50 '2'  52 '4'  32 ' '  99 'c'  104 'h' 97 'a'
__output__0xfffffffff598: 114 'r' 115 's' 46 '.'  32 ' '  121 'y' 101 'e' 115 's' 0 '\000'

        
    

Indeed!

The next instruction that is executed, ldp x29, x30, [sp, #16], will restore the old values of x29 and x30 from the stack.

    

        
        (gdb) nexti
__output__0x0000aaaaaaaa0770 in f (src=<optimized out>) at stack-buffer-overflow.c:7
__output__7           return buffer[2];

        
    
    

        
        (gdb) disass
__output__Dump of assembler code for function f:
__output__   0x0000aaaaaaaa0754 <+0>:     sub     sp, sp, #0x20
__output__   0x0000aaaaaaaa0758 <+4>:     stp     x29, x30, [sp, #16]
__output__   0x0000aaaaaaaa075c <+8>:     add     x29, sp, #0x10
__output__   0x0000aaaaaaaa0760 <+12>:    mov     x1, x0
__output__   0x0000aaaaaaaa0764 <+16>:    add     x0, sp, #0x8
__output__   0x0000aaaaaaaa0768 <+20>:    bl      0xaaaaaaaa0630 <strcpy@plt>
__output__   0x0000aaaaaaaa076c <+24>:    ldp     x29, x30, [sp, #16]
__output__=> 0x0000aaaaaaaa0770 <+28>:    ldrb    w0, [sp, #10]
__output__   0x0000aaaaaaaa0774 <+32>:    add     sp, sp, #0x20
__output__   0x0000aaaaaaaa0778 <+36>:    ret
__output__End of assembler dump.

        
    
    

        
        (gdb) info registers sp x29 x30
__output__sp             0xfffffffff580      0xfffffffff580
__output__x29            0x6168632034322065  7018969009222721637
__output__x30            0x736579202e7372    32481193227088754

        
    

Indeed, registers x29 and x30 do not contain the value that they had at the start of the function. They were supposed to have those values by now though.

The return ret instruction at the end of the function uses the value in register x30 as the address to jump to. Let’s see what happens when you execute it.

    

        
        (gdb) nexti
__output__0x0000aaaaaaaa0774      7           return buffer[2];

        
    
    

        
        (gdb) nexti
__output__0x0000aaaaaaaa0778      7           return buffer[2];

        
    
    

        
        (gdb) disass
__output__Dump of assembler code for function f:
__output__   0x0000aaaaaaaa0754 <+0>:     sub     sp, sp, #0x20
__output__   0x0000aaaaaaaa0758 <+4>:     stp     x29, x30, [sp, #16]
__output__   0x0000aaaaaaaa075c <+8>:     add     x29, sp, #0x10
__output__   0x0000aaaaaaaa0760 <+12>:    mov     x1, x0
__output__   0x0000aaaaaaaa0764 <+16>:    add     x0, sp, #0x8
__output__   0x0000aaaaaaaa0768 <+20>:    bl      0xaaaaaaaa0630 <strcpy@plt>
__output__   0x0000aaaaaaaa076c <+24>:    ldp     x29, x30, [sp, #16]
__output__   0x0000aaaaaaaa0770 <+28>:    ldrb    w0, [sp, #10]
__output__   0x0000aaaaaaaa0774 <+32>:    add     sp, sp, #0x20
__output__=> 0x0000aaaaaaaa0778 <+36>:    ret
__output__End of assembler dump.

        
    
    

        
        (gdb) nexti
__output__0x00736579202e7372 in ?? ()

        
    
    

        
        (gdb) disass
__output__No function contains program counter for selected frame.

        
    
    

        
        (gdb) nexti
__output__Cannot access memory at address 0x736579202e7372

        
    

The program counter now points to 0x736579202e7372. It is not a valid address. It is not even a 4-byte aligned value, which the program counter always should be. That results in a “Bus error” signal when you continue to run the program:

    

        
        (gdb) cont
__output__Continuing.
__output__
__output__Program received signal SIGBUS, Bus error.
__output__0x00736579202e7372 in ?? ()

        
    
Back
Next