This writeup explains how we solved a reversing task called moon
during the
Google CTF last weekend. In summary, the task implements a password check
using OpenGL shaders.
We’re given a Portable Executable (Download) for
x86-64
that was obviously compiled using MinGW (no
Rich header but the This program
cannot be run in DOS mode
stub used typically used by MSVC – Borland prints a
different message):
00000000: 4d 5a 90 00 03 00 00 00 04 00 00 00 ff ff 00 00 |MZ..............|
00000010: b8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 |........@.......|
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030: 00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00 |................|
00000040: 0e 1f ba 0e 00 b4 09 cd 21 b8 01 4c cd 21 54 68 |........!..L.!Th|
00000050: 69 73 20 70 72 6f 67 72 61 6d 20 63 61 6e 6e 6f |is program canno|
00000060: 74 20 62 65 20 72 75 6e 20 69 6e 20 44 4f 53 20 |t be run in DOS |
00000070: 6d 6f 64 65 2e 0d 0d 0a 24 00 00 00 00 00 00 00 |mode....$.......|
00000080: 50 45 00 00 64 86 09 00 00 00 00 00 00 00 00 00 |PE..d...........|
In general, Portable Executables tend to come together with a lot of statically
linked-in library code, which we usually are not interested in during reversing.
As such, the most convenient way of determining the main
function in Windows
binaries that occur during CTFs is usually just quickly searching for suspicious
strings such as “Wrong”, “Congratz”, “flag”, or even the program name. Moreover,
as compilers are quite deterministic, all non-library strings will be placed
close to each other. And indeed, using the hacker tool strings
once more we
find:
➜ moon strings moon.exe| grep -iC8 moon
%.8x
glGetShaderiv:
glCompileShader:
GL_TIMEOUT_EXPIRED or GL_WAIT_FAILED
vert_xyz
vert_uv
NopeGood
SDL_Init: %s
Moon
gl3wInit()
OpenGL 4.3 not supported
basic_string::append
timeout error
Bye bye...
30c7ead97107775969be4ba00cf5578f1048ab1375113631dbb6871dbe35162b1c62e982eb6a7512f3274743fb2e55c818912779ef7a34169a838666ff3994bb4d3c6e14ba2d732f14414f2c1cb5d3844935aebbbe3fb206343a004e18a092daba02e3c0969871548ed2c372eb68d1af41152cb3b61f300e3c1a8246108010d282e16df8ae7bff6cb6314d4ad38b5f9779ef23208efe3e1b699700429eae1fa93c036e5dcbe87d32be1ecfac2452ddfdc704a00ea24fbc2161b7824a968e9da1db756712be3e7b3d3420c8f33c37dba42072a941d799ba2eebbf86191cb59aa49a80ebe0b61a79741888cb62341259f62848aad44df2b809383e09437928980f
?UUU?
glActiveProgramEXT
Hmm … We can spot “Nope” and “Good”, which probably indicate whether we were
successful, a “Bye bye” message that apparently is printed when the program is
closed and a high entropy hexadecimal string that already should catch our
attention. Additionally, we can see several instances of the letters gl
, which
hint towards OpenGL being used in the program. Let’s fire up IDA and see what is
going on inside the binary!
Searching backwards from the long hexstring in IDA, we find a function that performs all kinds of interesting static initializations. After doing some cleanup und some renaming, the Hex-Rays output looks as follows:
__int64 init_global_vars()
{
some_byte_foo_ptr = (__int64)&some_byte_foo_dat;
new_basic_string((_BYTE **)&some_byte_foo_ptr, byte_49A010, aVersion430InVec3Ve);
sub_411C40();
shader_one_ptr = (__int64)&shader_one_dat;
new_basic_string((_BYTE **)&shader_one_ptr, &aVersion430InVec3Ve[315], &aVersion430InVec3Ve[491]);
sub_411C40();
shader_two_ptr = (__int64)&shader_two_dat;
new_basic_string((_BYTE **)&shader_two_ptr, aVersion430InVec3Ve, &aVersion430InVec3Ve[315]);
sub_411C40();
letters_png_ptr = (__int64)&letters_png_dat;
new_basic_string((_BYTE **)&letters_png_ptr, &unk_49C23A, &off_49CBD0);
sub_411C40();
qword_4CA088 = 0i64;
qword_4CA080 = &byte_4CA090;
byte_4CA090 = 0;
sub_411C40();
long_hexstring_ptr = &long_hexstring_dat;
new_basic_string(
(_BYTE **)&long_hexstring_ptr,
"30c7ead97107775969be4ba00cf5578f1048ab1375113631dbb6871dbe35162b1c62e982eb6a7512f3274743fb2e55c818912779ef7a34169a83"
"8666ff3994bb4d3c6e14ba2d732f14414f2c1cb5d3844935aebbbe3fb206343a004e18a092daba02e3c0969871548ed2c372eb68d1af41152cb3"
"b61f300e3c1a8246108010d282e16df8ae7bff6cb6314d4ad38b5f9779ef23208efe3e1b699700429eae1fa93c036e5dcbe87d32be1ecfac2452"
"ddfdc704a00ea24fbc2161b7824a968e9da1db756712be3e7b3d3420c8f33c37dba42072a941d799ba2eebbf86191cb59aa49a80ebe0b61a7974"
"1888cb62341259f62848aad44df2b809383e09437928980f",
"");
return sub_411C40();
}
At the bottom, we see the contents of long_hexstring
being assigned to a
dynamically allocated std::string
. In total new_basic_string
(which we named
according to the error messages found in this binary function) is called four
times with four different arguments:
0x49A010
A string representing a program written in a C-like language:
#version 430
in vec3 vert_xyz;
in vec2 vert_uv;
out vec2 frag_uv;
void main() {
frag_uv = vert_uv;
// Rescale the scene so it's in pixels.
gl_Position = vec4(
(vert_xyz.x / 1280.0) * 2.0 - 1.0,
-((vert_xyz.y / 720.0) * 2.0 - 1.0),
vert_xyz.z / 1024.0,
1.0
);
Another program:
#version 430
in vec2 frag_uv;
out vec4 frag_colour;
uniform sampler2D tex;
void main() {
frag_colour = vec4(1.0, 1.0, 1.0, 1.0) * texture(tex, frag_uv);
Something that starts with a PNG header at address 0x49C23A
which, dumped
using binwalk
, looks like the charset used by the program:
Okay, we can clearly understand why the program needs the PNG, but what are those mysterious C-like snippets about? A brief query to our search engine of choice tells us that we are dealing with OpenGL Shading Language (abbreviated GLSL). In summary, GLSL allows for embedding shaders into a program that get compiled to ARB Assembly Language which then in turn can be uploaded to the graphics hardware for further processing. Nevertheless, the two snippets we found above are totally boring and don’t do anything that is of interest for us. Let’s try to find out what that unreadable gibberish (no. 1 from the list above) was.
Looking at the locations from where some_byte_foo_ptr
is accessed, we can spot
a simple string scrambling function (at 0x401a30
):
if ( some_byte_foo_ptr != qword_4CA128 + some_byte_foo_ptr )
{
v2 = 1i64;
v3 = (_BYTE *)(some_byte_foo_ptr + 1);
v4 = 0i64;
v5 = -561040962;
v6 = *(_BYTE *)some_byte_foo_ptr ^ 0xBE;
while ( 1 )
{
*((_BYTE *)v0 + v4) = v6;
v17 = v2;
*((_BYTE *)Memory + v2) = 0;
if ( v3 == (_BYTE *)v1 )
break;
v7 = v5;
v8 = v5;
v9 = __ROL4__(v5, 3);
v4 = v17;
v10 = (v8 >> 21) ^ (v7 << 8);
v11 = 15i64;
v5 = v10 + v9 - 0x6837E5D9;
v0 = (__int64 *)Memory;
v2 = v17 + 1;
v6 = *v3 ^ v5;
if ( Memory != &v18 )
v11 = v18;
++v3;
if ( v2 > v11 )
{
sub_4870A0(&Memory, v17, 0i64, 0i64);
v0 = (__int64 *)Memory;
}
}
}
At this point, we anticipated that unscrambling the string was not the main
point of the challenge, and therefore we just decided to use
x64dbg to dump the unscrambled data. This can easily
be done by setting a breakpoint to 0x401B30
(which is directly after the loop)
and patching up 0x401aa0
(containing a mov r14d, 0x0F
) to actually
unscramble more than just the first 15 bytes of the data. Afterwards, the
data can be dumped conveniently from memory referenced by location rsp+0x60
.
After fixing indentation and line breaks we get something that arouses interest immediately:
#version 430
layout(local_size_x=8,local_size_y=8)in;
layout(std430,binding=0) buffer shaderExchangeProtocol{
uint state[64];
uint hash[64];
uint password[32];
};
vec3 calc(uint p) {
float r = radians(p);
float c = cos(r);
float s = sin(r);
mat3 m = mat3(c, -s, 0.0, s, c, 0.0, 0.0, 0.0, 1.0);
vec3 pt = vec3(1024.0, 0.0, 0.0);
vec3 res = m*pt;
res += vec3(2048.0, 2048.0, 0.0);
return res;
}
uint extend(uint e) {
uint i;
uint r = e ^ 0x5f208c26;
for (i = 15; i < 31; i += 3){
uint f = e << i;
r ^= f;
}
return r;
}
uint hash_alpha(uint p) {
vec3 res=calc(p);
return extend(uint(res[0]));
}
uint hash_beta(uint p) {
vec3 res=calc(p);
return extend(uint(res[1]));
}
void main(){
uint idx = gl_GlobalInvocationID.x + gl_GlobalInvocationID.y * 8;
uint final;
if (state[idx] != 1){
return;
}
if ((idx & 1) == 0){
final = hash_alpha(password[idx/2]);
}else{
final = hash_beta(password[idx/2]);
}
uint i;
for (i=0; i < 32; i += 6){
final ^= idx << i;
}
uint h = 0x5a;
for (i = 0; i < 32; i++){
uint p = password[i];
uint r = (i * 3) & 7;
p = (p << r) | (p >> (8 - r));
p &= 0xff;
h ^= p;
}
final ^= (h | (h << 8) | (h << 16) | (h << 24));
hash[idx] = final;
state[idx] = 2;
memoryBarrierShared();
}
What we can see above is a simple transformation of data risiding in an array
password
. Note that despite the term hash
being used by the challenge author
(we did not rename anything in the code above), the steps performed by the
algorithm are actually reversible. Let’s discuss the above shader a bit more
in-depth. The first thing to understand is probably IO behaviour:
layout(std430,binding=0) buffer shaderExchangeProtocol{
uint state[64];
uint hash[64];
uint password[32];
};
This defines three arrays state
, hash
, and password
. As we can infer from
the main
function, password
is used for reading in data, hash
is used for
storing the result of the computation, and state
is (presumably) used to
signal whether a certain worker thread on the GPU already has finished the
calculation …
… which brings us to how processing on the GPU is done in general. The line
uint idx = gl_GlobalInvocationID.x + gl_GlobalInvocationID.y * 8;
and the fact that the hash
array stores 64 results tells us that our GPU cores
are organized in a 8 x 8 grid. The runtime assigns a (x, y) position in that
grid to each of the cores. From this grid position, the code can then calculate a
unique id, resulting in the transformation being invoked in parallel for
different input characters.
Let’s look at the actual computation steps that are eventually performed. The
first one is calc
, which involves some linear algebra magic.
vec3 calc(uint p) {
float r = radians(p);
float c = cos(r);
float s = sin(r);
mat3 m = mat3(c, -s, 0.0, s, c, 0.0, 0.0, 0.0, 1.0);
vec3 pt = vec3(1024.0, 0.0, 0.0);
vec3 res = m*pt;
res += vec3(2048.0, 2048.0, 0.0);
return res;
}
Even though this looks intimidating, the operation is quite simple: What we can
see here is that p
is used to construct a Rotation
Matrix by p
degrees in the
$(x, y)$ plane which gets applied to the vector $(1024, 0, 0)$ (the third
coordinate is left untouched). Afterwards, the result is translated (shifted) in
the $(x, y)$ plane by $(2048, 2048)$. With this knowledge, it becomes clear that
(assuming user input within ASCII range $(0, 255)$) this operation can easily be
inverted. In fact, if the python version of calc
looks like follows:
def calc(p):
r = math.radians(ord(p))
c = math.cos(r)
s = math.sin(r)
m = [[ c, -s, 0.0],
[ s, c, 0.0],
[0.0, 0.0, 1.0]]
pt = [ 1024.0, 0.0, 0.0 ]
res = np.dot(m, pt)
res += [2048.0, 2048.0, 0.0]
return res
then the reverse operation rev_calc
can be implemented by simply un-doing the
translation and then calculating the angle of the resulting vector and the
vector pointing into direction $(1024, 0, 0)$:
import math
import numpy as np
def unit_vector(vector):
return vector / np.linalg.norm(vector)
def angle_between(v1, v2):
v1_u = unit_vector(v1)
v2_u = unit_vector(v2)
return np.arccos(np.clip(np.dot(v1_u, v2_u), -1.0, 1.0))
def rev_calc(res):
res = np.subtract(res, [2048.0, 2048.0, 0.0])
p = angle_between(res, [1024.0, 0.0, 0.0])
return chr(int(math.degrees(p)))
The second computation step uses more traditional fixed length integer arithmetic
uint extend(uint e) {
uint i;
uint r = e ^ 0x5f208c26;
for (i = 15; i < 31; i += 3){
uint f = e << i;
r ^= f;
}
return r;
}
Looking carefully at this function, we can see that every third of the bits 0 to
14 is xored into the upper half of r
. Since r
is the input xored by a fixed
constant, the lowest 15 bits remain untouched and the operation becomes
trivially invertible:
def extend(e):
r = e ^ 0x5f208c26
for i in range(15, 31, 3):
r ^= (e << i) & 0xffffffff
return r & 0xffffffff
def rev_extend(r):
l = ((r ^ 0x5f208c26) & ((1 << 15) - 1))
for i in range(15, 31, 3):
r ^= (l << i) & 0xffffffff
return (r ^ 0x5f208c26) & 0xffffffff
Let’s look at the main
function of the shader again:
void main(){
uint idx = gl_GlobalInvocationID.x + gl_GlobalInvocationID.y * 8;
uint final;
if (state[idx] != 1){
return;
}
if ((idx & 1) == 0){
final = hash_alpha(password[idx/2]);
}else{
final = hash_beta(password[idx/2]);
}
uint i;
for (i=0; i < 32; i += 6){
final ^= idx << i;
}
uint h = 0x5a;
for (i = 0; i < 32; i++){
uint p = password[i];
uint r = (i * 3) & 7;
p = (p << r) | (p >> (8 - r));
p &= 0xff;
h ^= p;
}
final ^= (h | (h << 8) | (h << 16) | (h << 24));
hash[idx] = final;
state[idx] = 2;
memoryBarrierShared();
}
As extend
and calc
are used by hash_alpha
and hash_beta
, we are now
fully able to invert the first part of the algorithm. Nice!
What about the remaining steps?
So far, the transformation was performed for each character independently from each other (observe how password is just accessed by means of the global id). To stop simple character wise brute force, this obviously has to change; which is what the second part of the transformation deals with:
for (i=0; i < 32; i += 6){
final ^= idx << i;
}
uint h = 0x5a;
for (i = 0; i < 32; i++){
uint p = password[i];
uint r = (i * 3) & 7;
p = (p << r) | (p >> (8 - r));
p &= 0xff;
h ^= p;
}
final ^= (h | (h << 8) | (h << 16) | (h << 24));
The first for
loop simply xors the global id idx
to the result at several
postions. The more devastating part is the second loop: All characters of the
password are accumulated into one byte value by combining a bitwise rotate with
a multiplication by $3$. The result is then promoted to a 32 bit value by simply
duplicating the lowest byte and then xoring it to the final hash value. We don’t
know the value of h
and therefore cannot reconstruct final
from before the
final xor. Damn. Or can we?
Thinking about this a second we came to the conclusion that this it can be broken by performing a known plaintext attack on the target value the thread with idx 0. But, wait, what are our target values? Remember the long hexstring? Let’s get back to the binary a second time!
This is the main
function of the binary after cleaning it up and teaching IDA
about SDL headers:
SDL_StartTextInput();
do
{
while ( SDL_PollEvent(&event) )
{
evt_type = event.type;
if ( event.type == SDL_QUIT )
goto LABEL_34;
if ( event.type - 768 > 1 )
goto LABEL_15;
if ( event.window.event == SDL_PEEKEVENT )
{
if ( event.window.data2 == 27 )
{
LABEL_34:
v21 = v30;
goto LABEL_35;
}
if ( (event.window.data2 == 0x400000BB || event.window.data2 == 8) && qword_4CA088 )
{
sub_486E60((__int64)&qword_4CA080, qword_4CA088 - 1, 1i64);
evt_type = event.type;
cmp_flag = 0;
LABEL_15:
if ( evt_type == SDL_TEXTINPUT && strlen(event.edit.text) == 1 )
{
if ( (unsigned __int64)qword_4CA088 <= 0x1F )
{
string_append((__int64)&qword_4CA080, &event.jhat.hat, 1ui64);
if ( qword_4CA088 == 32 )
goto LABEL_27;
LABEL_19:
cmp_flag = 0;
}
else
{
if ( qword_4CA088 != 32 )
goto LABEL_19;
LABEL_27:
v17 = (__int128 *)v42;
Size = 0i64;
v35 = 0;
Memory = &v35;
if ( (unsigned __int8)do_check(qword_4CA080, (unsigned int *)v42) )
{
do
{
v18 = *(_DWORD *)v17;
*(_QWORD *)Dest = 0i64;
v32 = 0;
sprintf_0(Dest, "%.8x", v18);
v19 = &Dest[strlen(Dest)];
if ( v19 - Dest > 0x7FFFFFFFFFFFFFFFi64 - Size )
sub_4921C0("basic_string::append");
string_append((__int64)&Memory, Dest, v19 - Dest);
v17 = (__int128 *)((char *)v17 + 4);
}
while ( &v43 != v17 );
v20 = (char *)Memory;
if ( Size != long_hexstring_data || Size && memcmp(Memory, long_hexstring_ptr, Size) )
cmp_flag = 1;
else
cmp_flag = 2;
}
else
{
v25 = __iob_func();
sub_41E440(&v25[2], "timeout error\n", v26);
v20 = (char *)Memory;
}
if ( v20 != &v35 )
j_free(v20);
}
}
}
}
}
v21 = 0;
LABEL_35:
v22 = SDL_GetTicks();
*(float *)&v23 = (float)((float)v22 / 1000.0) - *(float *)&dword_4CA044;
*(float *)&dword_4CA044 = (float)v22 / 1000.0;
dword_4CA040 = v23;
SDL_GL_MakeCurrent(v27, v28);
glUseProgram((unsigned int)dword_4CA0A4);
sub_402660((float *)&v36);
glUseProgram(0i64);
SDL_GL_SwapWindow(v27);
}
while ( !v21 );
puts("Bye bye...");
The long_hexstring_ptr
is used here in a memcmp
. The rest of the code is
boilerplate to read in user input, store it at 0x4ca080
and then throw it into
do_check
. This function, unsurprisingly, invokes the secreet shader with the
input the user entered (renaming the global function names based on string
analysis where the pointer values are actually obtained from GetProcAddress
):
glUseProgram((unsigned int)flagCheckProgram);
if ( !(unsigned char)glIsBuffer((unsigned int)dword_4CA0A0) )
{
memset(Dst, 0, 0x280ui64);
glGenBUffers(1i64, &dword_4CA0A0);
glBindBufferBase(37074i64, 0i64, (unsigned int)dword_4CA0A0);
glBindBuffer(37074i64, (unsigned int)dword_4CA0A0);
glBufferData(37074i64, 640i64, Dst, 35048i64);
}
// ...
glDispatchCompute(8i64, 8i64, 1i64);
LODWORD(v21) = glFenceSync(37143i64, 0i64);
v22 = v21;
if ( (glClientWaitSync(v21, 1i64, 1000000000i64) - 37147) & 0xFFFFFFFD )
// ...
Alright, no surprises here, the long_hexstring_ptr
is the result of the
computation of our 8 x 8 gpu thread grid. Let’s finish this …
Remember, we perform a known-plaintext attack on the value of h
in the
shader’s main function that is xored to the final
variable in the last step.
We assume that the password starts with a “C” (for CTF{}, yay flag format) and
do the following in our python script (remember, idx 0 skips the second last xor
loop):
import math, struct, binascii
import numpy as np
dat = binascii.unhexlify(open('magic', 'r').read().strip())
u = lambda x: struct.unpack(">I", x)[0]
grid = [ [ u(dat[j * 32 + i * 4:][:4]) for i in range(8) ] for j in range(8) ]
print(hex(extend(int(calc('C')[0] + 0)) ^ grid[0][0]))
➜ moon ./pwn.py
0x6f6f6f6f
Success! The fact that we can see four duplicated byte values tells us that for the correct password
uint h = 0x5a;
for (i = 0; i < 32; i++){
uint p = password[i];
uint r = (i * 3) & 7;
p = (p << r) | (p >> (8 - r));
p &= 0xff;
h ^= p;
}
ends up with h = 0x6f
.
This is the final solution script (with magic
being just the
long hex string that we obtained running strings
):
#!/usr/bin/env python3
import math, struct, binascii
import numpy as np
dat = binascii.unhexlify(open('magic', 'r').read().strip())
u = lambda x: struct.unpack(">I", x)[0]
grid = [ [ u(dat[j * 32 + i * 4:][:4]) for i in range(8) ] for j in range(8) ]
tgt = grid[0][0]
def unit_vector(vector):
return vector / np.linalg.norm(vector)
def angle_between(v1, v2):
v1_u = unit_vector(v1)
v2_u = unit_vector(v2)
return np.arccos(np.clip(np.dot(v1_u, v2_u), -1.0, 1.0))
def extend(e):
r = e ^ 0x5f208c26
for i in range(15, 31, 3):
r ^= (e << i) & 0xffffffff
return r & 0xffffffff
def rev_extend(r):
l = ((r ^ 0x5f208c26) & ((1 << 15) - 1))
for i in range(15, 31, 3):
r ^= (l << i) & 0xffffffff
return (r ^ 0x5f208c26) & 0xffffffff
def calc(p):
r = math.radians(ord(p))
c = math.cos(r)
s = math.sin(r)
m = [[ c, -s, 0.0],
[ s, c, 0.0],
[0.0, 0.0, 1.0]]
pt = [ 1024.0, 0.0, 0.0 ]
res = np.dot(m, pt)
res += [2048.0, 2048.0, 0.0]
return res
def rev_calc(res):
res = np.subtract(res, [2048.0, 2048.0, 0.0])
p = angle_between(res, [1024.0, 0.0, 0.0])
return chr(int(math.degrees(p)))
def second(x, y):
final = extend(int(calc('C')[0] + 0)) ^ grid[0][0]
idx = x + y * 8
tgt = grid[y][x] ^ final
for i in range(0, 32, 6):
tgt ^= (idx << i) & 0xffffffff
return tgt
print(hex(extend(int(calc('C')[0] + 0)) ^ grid[0][0]))
y = 0
x = 0
for y in range(8):
for x in range(0, 8, 2):
tmp = [ rev_extend(second(x, y)), rev_extend(second(x + 1, y)), 0.0 ]
z = rev_calc([ rev_extend(second(x, y)), rev_extend(second(x + 1, y)), 0.0 ])
print(z, end = '')
print('')
Executing the script gives as (almost) the flag:
➜ moon ./pwn.py
0x6f6f6f6f
CTF{OpenGLMoonMoonG0erT0TheMoon}
Which, after some guessing gave us the correct flag: