This crackme-style reversing task consisted of an x86_64 binary whose main operations are:

- Build a table
`data`

of 128-bit constants. - Check that the input matches
`flag\{[0-9a-f]{32}\}`

. - Hex-decode the inner part of the flag to obtain a byte string
`x`

. - Initialize a 128-bit integer variable
`v`

with the value -1 (all bits set). - For each byte
`c`

in`"flag{" + x + "}"`

:- Compute
`v := data[(v&0xff) ^ c] ^ (v >> 8)`

.

- Compute
- Check that
`v`

in little-endian representation equals`x`

.

There seems to be little hope to solve this problem generally for a “random” `data`

array, hence we first focused on finding useful structure in the algorithm that generates the constants:

```
bit = 8
off = 128
val = 1
data[0] = 0
do {
lsb = val & 1
val >>= 1
if (lsb) {
val ^= 0xb595cf9c8d708e2166d545cf7cfdd4f9
}
ctr = 0
do {
data[off+ctr] = data[ctr] ^ val;
} while ((ctr += 2*off) < 0x100)
off >>= 1
} while (--bit)
```

The first part of the outer loop (shift and XOR) looks like division by $t$ in a binary finite field $\mathrm{GF}(2^{128})=\mathrm{GF}(2)[t]/\mathit{poly}$. One way to divide by $t$ is making sure (by adding the modulus) that the constant term is zero, then simply replacing each $t^i$ by $t^{i-1}$. The code above is equivalent to these steps, but performs them in reverse order, hence we need to shift the given constant and add back the lost LSB to obtain the modulus polynomial. Assuming this is true, the polynomial in use must be:

```
poly := 0xb595cf9c8d708e2166d545cf7cfdd4f9 << 1 | 1
```

Hence the value of `val`

ranges through $t^{-1}$, $t^{-2}$, $t^{-3}$, etc., which can be seen to always equal $t^{\mathtt{bit}-7}$. The second part of the loop adds this $t^{\mathtt{bit}-7}$ to all the `data[i]`

with the $(\mathtt{bit}{-}1)$th bit of `i`

set. In other words:
\[
\mathit{data}[i] = \mathrm{bits\_to\_poly}(i) / t^8
\,\text.
\]

^{(Here, $\mathrm{bits\_to\_poly}$ denotes reinterpreting an integer in $\{0,\dots,2^{128}-1\}$ as an element of $\mathrm{GF}(2^{128})$.)
}

`sage`

quickly confirms this:

```
poly = 0xb595cf9c8d708e2166d545cf7cfdd4f9 << 1 | 1
F.<t> = GF(2**128, modulus = sum(int(c)*x**i for i,c in enumerate(bin(poly)[2:][::-1])))
es = []
for l in open('data.txt').read().strip().split('\n'):
lo,hi = map(int, l.strip().replace(' ','').rstrip(',').split(','))
es.append(F.fetch_int(hi<<64 | lo))
for i in range(256):
assert es[i] == F.fetch_int(i) / t**8
```

Revisiting the flag checking algorithm with this new information, we realize that

```
v := data[(v&0xff) ^ c] ^ (v >> 8)
```

is just an addition of the flag byte followed by a division by $t^8$!
Instead of viewing this as consuming one flag byte at a time, we may equivalently XOR the flag bytes into `v`

in the beginning (hence making it bigger than 128 bits).

This point of view implies that the output of the flag processing loop is nothing more than a computation of the form \[ f(x) = (C + t^{40}\cdot x)/t^{8\cdot22} = C_1 + C_2\cdot x \]

where $x$ are the flag bits converted to a polynomial and $C$ is a constant coming from the `flag{...}`

around the input and the initialization of `v`

as -1.

The main consequence of this is: **the flag checking algorithm is an affine function over the vector space $\mathrm{GF}(2)^{128}$!**

Hence, we can recover $C_1$ by simply evaluating that function at $0$, and since $f-C_1$ is $\mathrm{GF(2)}$-linear we can also (by evaluating $f-C_1$ on a basis of $\mathrm{GF}(2)^{128}$) find a matrix $A\in\mathrm{GF(2)^{128\times128}}$ such that $f(x)=C_1+A\cdot x$.

With that description, it is easy to find a fixed point: If $f(x)=x$, then $C_1+Ax = x$, hence $(A-1)x = C_1$. This is just what the following `sage`

script does (it needs the values initialized in the first `sage`

snippet above):

```
v2n = lambda v: int(''.join(map(str, v)), 2)
n2v = lambda n: vector(GF(2), '{:0128b}'.format(n))
n2s = lambda n: struct.pack('<QQ', n&2**64-1, n>>64).encode('hex')
flagify = lambda v: 'flag{' + n2s(v2n(v)) + '}'
def compute(vecflag):
assert vecflag in GF(2)**128
hexflag = flagify(vecflag)
decflag = hexflag[:5] + hexflag[5:-1].decode('hex') + hexflag[-1:]
v = -1 & 2**128-1
assert len(decflag) == 22
for c in map(ord, decflag):
v = (v >> 8) ^^ (es[(c ^^ v) & 0xff]).integer_representation()
return n2v(v)
origin = compute(vector(GF(2),128))
A = []
for i in range(128):
v = vector(GF(2), (j == i for j in range(128)))
A.append(compute(v) - origin)
sol = (matrix(A) - 1).solve_left(origin)
flag = flagify(sol)
print('input: {:#x}'.format(v2n(sol)))
print('output: {:#x}'.format(v2n(compute(sol))))
print('--> {}'.format(compute(sol) == sol))
print(flag)
```

Running this code yields the flag:

```
$ ./pwn.sage
input: 0xa4efbaba0fa55545faf7b779c3440367
output: 0xa4efbaba0fa55545faf7b779c3440367
--> True
flag{670344c379b7f7fa4555a50fbabaefa4}
$
```

One can also compute $C$ in the equation for $f$ above explicitly and simply solve for a root of $f-x$, a linear polynomial. During the CTF, we went for the linear-algebraic approach outlined above, which is why it’s shown first, but for completeness, here’s the code for directly finding a solution (again making use of things defined above):

```
prfx = b'flag{'
tpl = prfx + b'\0'*16 + b'}'
R.<x> = F[]
f = sum(t**i for i in range(128))
f += sum(F.fetch_int(ord(c))*t**(8*i) for i,c in enumerate(tpl))
f += t**(8*len(prfx))*x
f /= t**(8*22)
for sol,_ in (f - x).roots():
print('flag{' + n2s(sol.integer_representation()) + '}')
```