Ghost in the Shellcode: RTFM Writeup

RTFM which usually stands for "Read the fucking manual" was used here for "Rich text for Morons". The challenge supplied an executable x86 ELF file and a data file.

First thing was to figure out that the executable was used to generate the data file. The data file included the header "RTFM", which we also found to be written to an allocated memory in the disassembly. Inside this function we also found the code that actually converts the data.

We did some testing with different input data and figured out that the output file size can be less or more than the input data. For a one byte input we got either one or 1.5 bytes output. A repeating byte in the input from the readable characters was often compressed to less than the input size. A repeating 0x01 was always encoded to repeating 1.5 bytes. On 0x00 in the input the executable stopped its execution. Similiar things happened to bytes above 0x80. Actually here is the point to see that more common/used character(sequences) are encoded to less, which hints to a huffman compression.

Anyway we thought it is easier to use a different approach instead of reversing the huffman code including the tables. So we wrote a small application using bruteforce. We put a new byte to the input and then compare if it matches more bits on the output than any other input byte. We use that in a recursive function to make sure, that it will also follow repeating bytes that get mapped to the same length as the input with one byte less before.

Our none-error-tolerant code in not-best-coding-style:

  1#include <stdio.h>
  2#include <unistd.h>
  3
  4const char *inputfile = "rtfm_input";
  5const char *outputfile = "rtfm_input.rtfm";
  6const char *originalfile = "rtfm_original.bak";
  7
  8unsigned char org[0x100000];
  9
 10void load_org ( )
 11{
 12	FILE *tmp = fopen ( originalfile, "rb" );
 13	fread ( org, 1, sizeof ( org ), tmp );
 14	fclose ( tmp );
 15}
 16
 17unsigned int compare ( unsigned char* data, unsigned int data_len )
 18{
 19	unsigned char test[0x100000];
 20	unsigned int test_length = 0;
 21	unsigned int i;
 22	unsigned char b1, b2;
 23
 24	pid_t bla;
 25
 26	FILE *tmp = fopen ( inputfile, "wb" );
 27	fwrite ( data, 1, data_len, tmp );
 28	fclose ( tmp );
 29
 30	/* save some overhead with exec */
 31	//system ( "./rtfm-67cc5dcb69df4244bcf2d573481e6d6a06b861a3 rtfm_input > /dev/null" );
 32	bla = fork();
 33	if ( bla == 0 )
 34	{
 35		int fd = open("/dev/null", 0 );
 36		dup2(fd, 1);
 37		close ( fd );
 38		execl ( "./rtfm-67cc5dcb69df4244bcf2d573481e6d6a06b861a3", "rtfm_input", "rtfm_input", NULL );
 39	}
 40	else
 41		wait ( &bla );
 42
 43	tmp = fopen ( outputfile, "rb" );
 44	test_length = fread ( test, 1, sizeof ( test ), tmp );
 45	fclose ( tmp );
 46
 47	/* now compare bitwise */
 48	for ( i = 4*8; i < test_length*8; i++ )
 49	{
 50		b1 = test[i/8] >> (7 - (i%8));
 51		b2 = org[i/8] >> (7 - (i%8));
 52
 53		if ( b1 != b2 )
 54		{
 55//			printf ( "equal bits(%u) bytes(%u)\n", i, i/8 );
 56			return i-4*8;
 57		}
 58	}
 59
 60	printf ( "all same. found: %s\n", data );
 61	exit ( 1 );
 62
 63	return 0xFFFFFFFF;
 64
 65}
 66
 67void checknextbyte ( int data_len, unsigned char *data, unsigned int maxlen, unsigned int depth  )
 68{
 69	unsigned char c, d;
 70	unsigned int tmp = 0;
 71//	if ( depth > 10 )
 72//		return;
 73
 74	for ( c = 1; c < 0x80;  c++ )
 75	{
 76		data[data_len] = c;
 77		tmp = compare ( data, data_len+1 );
 78
 79		if ( tmp > maxlen )
 80		{
 81			/* if u don't set maxlen again, it will converge to the output and find
 82			it after centuries (might be more correct way todo it) */
 83			maxlen = tmp;
 84			data[data_len+1] = 0;
 85
 86			printf ( "char %c (%d) fits, len: %d: %s\n", c, c, tmp, data );
 87			checknextbyte (  data_len+1, data, maxlen+1, depth+1  );
 88		}
 89	}
 90}
 91
 92int main ()
 93{
 94	unsigned char buf[0x200000];
 95
 96	printf ( "censored :)\n" );
 97
 98	load_org();
 99
100	checknextbyte ( 0,  buf, 0, 0 );
101
102	return 0;
103}

During bruteforcing you see the current output. Excitement! The overhead of the "Rich Text Format" was more annoying than ever before ;-) The key "Did I have Vari-good-code?" is again a nice wordplay.

The process of decompressing using bruteforce took more than one hour. Most of the CPU power was used for system calls, since we did not copypaste the encryption routine.