Tracking Input with DTrace on OS X

When performing reverse engineering whether for vulnerability research or malware analysis at some point you will need to track input data. Usually one would start from the point of entry of the input and follow the code flow from there. This can be achieved via debugger: setting BPs on interesting points and slowly moving through each break inspecting arguments and return values of called functions. On top of that, if our debugger of choice supports scripting then we can try to automate this process. In this post I will focus on introducing similar automated functionality with DTrace for OS X platform (some minor tweaks may be required for other supported platforms).

Plan of action

Our solution will be simple. We will try to mimic what we would normally do under the debugger and to achieve this we will put DTrace probes at places that are interesting to us. For the following article these are open() and read() functions from the C library. Of course the more you know about your targeted application the better choice you will make.

To list all functions that can be probed run $ sudo dtrace -ln 'pid$target:::entry {}' -c .

Our plan of action can be summarised as:

1. Enumerate interesting functions
2. Set-up probes at points of interest
3. Dump the data
4. Look for known input

For this article we already have our targets but if that would not be the case, we could for example trace open() system call via syscall provider and dump its first argument along with user callstack. Usually this would give us nice overview of the code flow. For setting-up probes we could also utilize syscall provider, however I will use pid provider to gain better performance. Data dumping is done via tracemem on appropriate pointers used by read(). Finally, looking for known input is done by the machine operator — you.

DTrace does not support loops or actual if statements (probe predicates and ternary operator do not count) and that is why we cannot fully automate our script, hence requirement for manual inspection.

This approach is somewhat similar to what Peter and Brandon did in MindshaRE couple of years ago. But, as opposed to Peter we do not need to manually patch any particular function, just observe at the point of entry/return which is similar to Detours mentioned in the comments section of his post.

Implementation

First of all we want to probe entry of the open() function along with a predicate on the file that is interesting to us:

pid$target::__open:entry
/copyinstr(arg0) == "/Users/ad/Desktop/test.mp3"/
{
self->fname = copyinstr(arg0);
self->openok = 1;
}

The only actions we are taking inside of this probe are setting thread-local variables self->fname and self->openok which we will use in our next probe:

pid$target::__open:return
/self->openok/
{
trackedfd[arg1] = 1;
printf("Opening %s with fd %#x\n", self->fname, arg1);
self->fname = 0;
self->openok = 0;
}

As you can see, the probe is set on return of the open() and we are using self->openok variable as a condition to make sure we are in a proper open() return (execution wise). Inside of the probe we are doing couple of things:

  • Setting a flag for opened file descriptor inside of the global array trackedfd[] (arg1 holds return value)
  • Printing out logging information
  • Freeing variables

After this we are ready to monitor any function that makes use of marked file descriptor. In our case this function is read():

pid$target::read:entry
/trackedfd[arg0] == 1/
{
self->rfd = arg0;
self->rbuf = arg1;
self->rsz = arg2;
}

pid$target::read:return
/self->rfd/
{
printf("Reading from fd %#p to buf %#p size %#x\n", self->rfd, self->rbuf, self->rsz);
tracemem(copyin(self->rbuf, arg1), 64);
ustack(); printf("\n");
self->rfd = 0;
self->rbuf = 0;
self->rsz = 0;
}

Probe set on entry of read() should be self-explanatory by now. The probe set on return does logging, dumping of read()'s destination buffer, and displaying user-mode callstack.

As a last step we will zero-out file descriptor flag stored by trackedfd[] array in close() function:

pid$target::close:entry
/trackedfd[arg0] == 1/
{
trackedfd[arg0] = 0;
}

After putting it all together we get the following script:

#!/usr/sbin/dtrace -s

#pragma D option destructive
#pragma D option quiet

BEGIN
{
trackedfd[0] = 0;
}

pid$target::__open:entry
/copyinstr(arg0) == "/Users/ad/Desktop/test.mp3"/
{
self->fname = copyinstr(arg0);
self->openok = 1;
}

pid$target::__open:return
/self->openok/
{
trackedfd[arg1] = 1;
printf("Opening %s with fd %#x\n", self->fname, arg1);
self->fname = 0;
self->openok = 0;
}

pid$target::read:entry
/trackedfd[arg0] == 1/
{
self->rfd = arg0;
self->rbuf = arg1;
self->rsz = arg2;
}

pid$target::read:return
/self->rfd/
{
printf("Reading from fd %#p to buf %#p size %#x\n", self->rfd, self->rbuf, self->rsz);
tracemem(copyin(self->rbuf, arg1), 64);
ustack(); printf("\n");
self->rfd = 0;
self->rbuf = 0;
self->rsz = 0;
}

pid$target::close:entry
/trackedfd[arg0] == 1/
{
trackedfd[arg0] = 0;
}

You can see that I have silently added 2 #pragmas, you can read about them here. I have also used BEGIN clause to initialise global array trackedfd[].

Usage example

For a quick and simplified example of tracing I will use VOX music player which is freely avilable on the Mac App Store, so without further ado:

Wed May 13 08:24 PM ttys008 [ad@mbp ~]
$ sudo ./fileinput.d -p 31337 > VOX.trace
^C

Wed May 13 08:24 PM ttys008 [ad@mbp ~]
$ less VOX.trace

Opening /Users/ad/Desktop/test.mp3 with fd 0x15
Opening /Users/ad/Desktop/test.mp3 with fd 0x15
Reading from fd 0x15 to buf 0x111fda108 size 0x1000

0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
0: 49 44 33 03 00 00 00 00 23 76 54 49 54 32 00 00 ID3.....#vTIT2..
10: 00 1b 00 00 00 54 72 61 76 65 6c 65 72 20 69 6e .....Traveler in
20: 20 74 68 65 20 57 6f 6e 64 65 72 6c 61 6e 64 54 the WonderlandT
30: 59 45 52 00 00 00 05 00 00 00 32 30 30 35 54 50 YER.......2005TP

libsystem_kernel.dylib`read+0x14
libbass.dylib`BASS_ErrorGetCode+0x1e1

[ ... ]

We seem to successfully tracked our input but the callstack does not look good (seems too small). Disassembling libbass.dylib and jumping to BASS_ErrorGetCode+0x1e1 results in the following code chunk:

Bad read

This code chunk is unusual. It does not contain any references (that is why IDA fails to recognise it as a function) and it lacks function prologue (that is why DTrace fails to display full callstack). Most probably it is a dynamic call, we can verify this assumption by inspecting the application inside of lldb:

(lldb) attach -p 31337
Process 31337 stopped
* thread #1: tid = 0x250206, 0x00007fff977ad4de libsystem_kernel.dylib`mach_msg_trap + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
frame #0: 0x00007fff977ad4de libsystem_kernel.dylib`mach_msg_trap + 10
libsystem_kernel.dylib`mach_msg_trap:
-> 0x7fff977ad4de <+10>: ret
0x7fff977ad4df <+11>: nop

libsystem_kernel.dylib`mach_msg_overwrite_trap:
0x7fff977ad4e0 <+0>: mov r10, rcx
0x7fff977ad4e3 <+3>: mov eax, 0x1000020

Executable module set to "/Applications/VOX.app/Contents/MacOS/VOX".
Architecture set to: x86_64-apple-macosx.
(lldb) image list

[ ... ]

[230] 0x000000010186d000 /Applications/VOX.app/Contents/Frameworks/VXBass.framework/Versions/A/libbass.dylib

[ ... ]

(lldb) b 0x00000001018757ae
Breakpoint 1: where = libbass.dylib`___lldb_unnamed_function122$$libbass.dylib + 4, address = 0x00000001018757ae
(lldb) c
Process 31337 resuming
Process 31337 stopped
* thread #18: tid = 0x2504d9, 0x00000001018757ae libbass.dylib`___lldb_unnamed_function122$$libbass.dylib + 4, stop reason = breakpoint 1.1
frame #0: 0x00000001018757ae libbass.dylib`___lldb_unnamed_function122$$libbass.dylib + 4
libbass.dylib`___lldb_unnamed_function122$$libbass.dylib:
-> 0x1018757ae <+4>: mov rax, rdx
0x1018757b1 <+7>: mov edx, esi
0x1018757b3 <+9>: mov rsi, rdi
0x1018757b6 <+12>: mov edi, eax
(lldb) x/80x $rdi
0x1087a9108: 0x00000000 0x00000000 0x00000000 0x00000000
0x1087a9118: 0x00000000 0x00000000 0x00000000 0x00000000
0x1087a9128: 0x00000000 0x00000000 0x00000000 0x00000000
0x1087a9138: 0x00000000 0x00000000 0x00000000 0x00000000

[ ... ]

[ ... ]

[ ... ]

(lldb) c
Process 31337 resuming
Process 31337 stopped
* thread #18: tid = 0x2504d9, 0x00000001018757ae libbass.dylib`___lldb_unnamed_function122$$libbass.dylib + 4, stop reason = breakpoint 1.1
frame #0: 0x00000001018757ae libbass.dylib`___lldb_unnamed_function122$$libbass.dylib + 4
libbass.dylib`___lldb_unnamed_function122$$libbass.dylib:
-> 0x1018757ae <+4>: mov rax, rdx
0x1018757b1 <+7>: mov edx, esi
0x1018757b3 <+9>: mov rsi, rdi
0x1018757b6 <+12>: mov edi, eax
(lldb) x/80x $rdi
0x1087b2199: 0x00000000 0x00000000 0x00000000 0x00000000
0x1087b21a9: 0x55555500 0x55555555 0x55555555 0x55555555
0x1087b21b9: 0x54474154 0x65766172 0x2072656c 0x74206e69
0x1087b21c9: 0x57206568 0x65646e6f 0x6e616c72 0x00000064
0x1087b21d9: 0x73755300 0x20756d75 0x6f6b6f59 0x00006174
0x1087b21e9: 0x00000000 0x00000000 0x00000000 0x53000000
0x1087b21f9: 0x6f626d79 0x0000006c 0x00000000 0x00000000
0x1087b2209: 0x00000000 0x00000000 0x00000000 0x30303200
0x1087b2219: 0x20202035 0x20202020 0x20202020 0x20202020
0x1087b2229: 0x20202020 0x20202020 0x20202020 0x0c030020

[ ... ]

(lldb) bt
* thread #18: tid = 0x2504d9, 0x00000001018757ae libbass.dylib`___lldb_unnamed_function122$$libbass.dylib + 4, stop reason = breakpoint 1.1
* frame #0: 0x00000001018757ae libbass.dylib`___lldb_unnamed_function122$$libbass.dylib + 4
frame #1: 0x0000000101879559 libbass.dylib`___lldb_unnamed_function171$$libbass.dylib + 163
frame #2: 0x00007fff95f36268 libsystem_pthread.dylib`_pthread_body + 131
frame #3: 0x00007fff95f361e5 libsystem_pthread.dylib`_pthread_start + 176
frame #4: 0x00007fff95f3441d libsystem_pthread.dylib`thread_start + 13

Following missing callstack entry leads us to a dynamic function call located at libbass+0xc556:

loc_10187951F:
mov ecx, [rbp+0F8h]
mov eax, [rbp+0FCh]
add rax, [rbp+100h]
mov edx, ecx
mov rbx, rdx
xor edx, edx
div rbx
mov eax, edx
sub ecx, edx
cmp esi, ecx
mov r12d, ecx
cmovbe r12d, esi
mov rdx, [rbp+50h]
cdqe
lea rdi, [r14+rax]
mov esi, r12d
call qword ptr [rbp+40h] ; dynamic call into read wrapper
mov ebx, eax
cmp eax, 0FFFFFFFFh
jnz short loc_1018

This seems to be the function that dispatches data reads and probably somewhere down the road parsing is taking place.

Also you have probably noticed that callstack from lldb, although better than from DTrace’s output, is still poor. It is the result of pthreads usage which cripples our dynamic analysis.

Conclusions

We successfully found the code responsible for input entry and from there we could start more tailored tracing operation in order to find code responsible for parsing data. The real strength of DTrace lies in a fast ad-hoc style analysis, we can quickly gain a lot of useful information which otherwise would require more work. DTrace has its own limitations and it is not a silver bullet (all in all we needed to use lldb), however one can often save a lot of time utilising its power which comes for free for any OS X installation.

OS X System Calls, MACH Edition

Just a quick post, delivered as promised.

First of all, MACH traps calling convention is different than BSD syscalls. I have decided to enumerate end-point arguments even though in reality a MACH trap takes only 1 argument which is a pointer to structure containing arguments as its fields. Secondly, mungers are also included as they provide some additional information. Finally, if you need some introduction to MACH programming, I have heard that nemo’s Abusing MACH on Mac OS X is a good starting point.

Feedback, ideas, bugs, et cetera — give me a shout.

OS X System Calls, BSD Edition

Last weekend I was playing with system calls on OS X and I found out that there is not much resources out there covering them with full and up-to-date information. Of course it is not a big problem, since XNU is open sourced you can find the syscalls you are looking for via greppin’, ctaggin’, et cetera but it is a PITA. Also, with j00ru’s Windows system call tables and myriad of lists for Linux syscalls this situation is simply unacceptable. Therefore, I have spent the rest of the weekend parsing XNU sources for system calls instead of playing with them. This resulted in the following reference list:

As implied by the name, for the moment I have focused on BSD system calls but in the near future I also plan to cover MACH traps (however I think I will skip machine dependent calls and diagnostic calls). When generating the BSD syscalls list I have decided to build upon syscalls.master file from XNU instead of a live-system version /usr/include/sys/syscalls.h because it provides more information (i.e. #ifdefs). With regard to arguments keep in mind that OS X 64-bit follows the standard AMD64 ABI calling convention, you can read about it at Mac Developer Library and if you are curious how to use it in practice check out Dustin’s post.

Feedback, ideas, bugs, et cetera — give me a shout.

SensePost Reversing Challenge Analysis

Couple of weeks ago I have stumbled upon a reversing challenge made by SensePost earlier this year. I was not able to find any public solutions to it so I thought that it would be interesting to make one. Recently I had some spare time on my hands so I have decided to follow this goal. Enjoy.

Initial analysis

After firing up IDA we can quickly pinpoint main() function which is located at 0x004016B0 (with base address as 0x00401000). This binary is not obfuscated nor protected in any way, hence we can immediately spot interesting calls such as fopen(), fgetc(), fread(), and strncmp(). We will follow with top-down style analysis.

First thing that we see in main() is fopen() call which is checked for success at 0x004016DF; if file opening operation was successful application proceeds with reading individual characters via fgetc() in a loop until position 0xFF is passed or EOF is reached. These two checks look like this:

mov eax, [esp+30h+fd]
mov [esp+30h+input_ptr_loc], eax
call fgetc
cmp eax, 0FFFFFFFFh ; (fgetc() == EOF)?
jz short loc_4

cmp [esp+30h+counter], 0FFh
jle short loc_4016F7

However, this loop is redundant. It does not affect any data that is later used. Next code block looks more interesting:

mov eax, [esp+30h+fd]
mov [esp+30h+var_24], eax ; src
mov [esp+30h+var_28], 100h ; number of elements
mov [esp+30h+tmp_ptr], 1 ; size of an element
mov [esp+30h+input_ptr_loc], offset input_ptr ; dst
call fread

We can clearly see that this code reads up 0x100 bytes from the file via fread() (this limits the size for a bytecode), then it starts preparing headers for comparison via strncmp() at 0x0040178D.
After a quick study of surrounding code we can conclude that the header is 0x10 bytes long and it looks like this:

65 67 76 6D 62 69 6E 61 72 79 00 00 00 00 00 00

Due to the inner workings of strncmp() the header is validated only up to the first NULL byte (man strncmp).

When the header is successfully validated following block is hit:

mov [esp+30h+input_ptr_loc], offset aHeaderMatched_
call puts
call loader
mov [esp+30h+input_ptr_loc], eax
mov [esp+30h+tmp_ptr], edx
call executor
leave
retn

I have already re-named the two most important functions in this binary and their names are self-explanatory.

loader() function

This function starts at 0x004017CD and interesting things are right in the beginning:

mov eax, [ebp+input_ptr_loc]
add eax, 10h
movzx eax, byte ptr [eax]
cmp al, 10h
jbe short loc_40182D ; JMP IF ([input_ptr_loc+0x10] <= 0x10)

It loads a value from a bytecode at offset 0x10 and then checks if it is below or equal to 0x10. As can be seen in a jump to True node, this happens to be IP (instruction pointer). However, I would call it EP (entry point) as it is more precise. So, EP is at offset 0x10 in a bytecode and upon initialization it cannot be less than nor equal to 0x10.
Next node does couple of things but the most important part is this:

mov eax, [ebp+input_ptr_loc]
add eax, 11h
movzx eax, byte ptr [eax]
cmp al, 4Fh
jbe short loc_40187E ; JMP IF ([input_ptr_loc+0x11] <= 0x4F)

Similar mechanism and again when inspecting jump to True node we can find out what it is, namely SP (stack pointer). Hence, SP value is at offset 0x11 in a bytecode and upon initialization it cannot be less than nor equal to 0x4F.

Once the bytecode was validated and loaded, the user has the ability to change both IP and SP to arbitrary values via some trickery.

Code block at 0x00401845 does nothing interesting (it prints out some information about the stack (SP and first byte)).
Final block of this function looks like this:

mov eax, [ebp+input_ptr_loc]
movzx eax, byte ptr [eax+12h] ; [input_ptr_loc+0x12] happens to be LC (loop counter)
mov byte ptr [ebp+tmp_ptr+2], al
mov byte ptr [ebp+tmp_ptr+3], 0
mov [esp+28h+string], offset aThreadContextI ; "\n\tThread context initialized, execution"...
call puts
mov eax, [ebp+input_ptr_loc] ; eax holds input_ptr_loc
mov edx, [ebp+tmp_ptr] ; edx holds DWORD with \x00+LC+SP+IP
leave
retn

Apart from printing out thread context information it does two important things. First, it loads one additional value from a bytecode at offset 0x12; this field was recognized later as LC (loop counter) and its existence is optional (required when a bytecode is using loops; can be crafted at runtime and SMC example takes advantage of that). Second, when returning it loads pointer to a bytecode in memory into EAX and it loads IP, SP, LC, 0x00 as DWORD value into EDX.

All internal values (IP/SP/LC) are BYTE long and a bytecode is at most 0x100 in size. This makes sense, even if more data would be loaded the VM is limited with the registers size.

executor() function

This function starts at 0x004018B8. However, first things first:

mov [esp+30h+input_ptr_loc], eax
mov [esp+30h+tmp_ptr], edx
call executor

That is why EAX and EDX were important when returning from loader() as both of them are used in executor() (passed as arguments).

Now, I will not go through each block of executor() function as most of them are easy enough to follow. I will focus on enumerating opcodes and describing associated code.

First interesting block is at the beginning of an opcode fetching loop and it looks like this:

mov edx, [ebp+input_ptr_loc]
movzx eax, [ebp+opcode_ptr]
movzx eax, al
add eax, edx
movzx eax, byte ptr [eax] ; eax = [input_ptr_loc+opcode_ptr]
test al, al
jnz loc_4018C4

Opcode is being fetched from a location pointed by input_ptr_loc+opcode_ptr (input_ptr_loc is always equal to static address and opcode_ptr is equal to the EP on first iteration) and it is loaded into EAX and checked against 0x0 (which by itself happens to be our first valid opcode). From there you should be able to enumerate all opcodes by just following the conditions. They are as follows:

0x0 opcode = break/exit execution
0x1 opcode = stack pointer decrement
0x2 opcode = initialize loop (set loop_flag for 0x3 opcode and decrement LC)
0x3 opcode = takes care of jumping for freshly initialized loop
0x4 opcode = obfuscate current byte on stack with inversed current byte (byte on stack = NOT(byte))
0x5 opcode = obfuscate current byte on stack with opcode_ptr+1 (next byte to currently processed opcode), addition method
0x6 opcode = obfuscate current byte on stack with opcode_ptr+1 (next byte to currently processed opcode), substraction method
0xB opcode = print char
0x9 opcode = debugging flag

After each opcode (apart from opcodes 0x0 and 0x9) IP is incremented by the following code:

movzx eax, [ebp+opcode_ptr]
add eax, 1
mov [ebp+opcode_ptr], al

Instruction set details

Opcode 0x0

This opcode is responsible for termination. Length of this instruction is one byte (opcode).

mov [esp+28h+var_28], 0Ah
call putchar
add esp, 24h
pop ebx
pop ebp
retn

Opcode 0x1

This opcode is responsible for decrementing SP. Length of this instruction is one byte (opcode).

stack_pointer_decrement:
movzx eax, [ebp+stack_pointer_loc]
sub eax, 1
mov [ebp+stack_pointer_loc], al
jmp loc_401BE3

SP is decremented hence the stack goes backward. This is important notion of this VM.

Opcode 0x2

This opcode is responsible for loop initializing. Length of this instruction is one byte (opcode).
There is an additional check made before initialization which verifies whether LC is set or not:

mov eax, [ebp+input_ptr_loc]
movzx eax, byte ptr [eax+12h]
mov [ebp+loop_counter], al
movzx eax, [ebp+loop_counter]
test al, al
jz short reset_loop_flag

If LC is set then it is decremented and loop_flag is set:

loop_init:
movzx eax, [ebp+loop_counter]
sub eax, 1
mov [ebp+loop_counter], al
mov eax, [ebp+input_ptr_loc]
lea edx, [eax+12h]
movzx eax, [ebp+loop_counter]
mov [edx], al ; [input_ptr_loc+0x12] = loop_counter-1
mov [ebp+loop_flag], 1 ; set loop_flag
jmp loc_401BE3

If LC is not set then loop_flag is NULLed and execution continues.

Opcode 0x3

This opcode is responsible for jumping. Length of this instruction is two bytes (opcode, new IP).
Because of an additional check it cannot be used as a stand-alone jump instruction (loop_flag needs to be set):

movzx eax, [ebp+loop_flag]
test al, al
jz short skip_next_opcode

If loop_flag is set then new IP is initialized:

set_opcode_ptr:
mov eax, [ebp+input_ptr_loc]
movzx edx, [ebp+opcode_ptr]
movzx edx, dl
add edx, 1
add eax, edx
movzx eax, byte ptr [eax] ; eax = [input_ptr_loc+(opcode_ptr+1)]
sub eax, 1
mov [ebp+opcode_ptr], al ; opcode_ptr = [input_ptr_loc+(opcode_ptr+1)]-1
jmp loc_401BE3

If loop_flag is not set it skips second byte with an additional IP increment:

skip_next_opcode:
movzx eax, [ebp+opcode_ptr]
add eax, 1
mov [ebp+opcode_ptr], al
jmp loc_401BE3

Opcodes 0x2 and 0x3 are mutually connected. One cannot be used without the other.

Opcode 0x4

This opcode is responsible for obfuscation. Length of this instruction is one byte (opcode).
Byte currently pointed by SP is inverted:

obfuscate_stack_byte:
mov edx, [ebp+input_ptr_loc]
movzx eax, [ebp+stack_pointer_loc]
movzx eax, al
add eax, edx
mov ecx, [ebp+input_ptr_loc]
movzx edx, [ebp+stack_pointer_loc]
movzx edx, dl
add edx, ecx
movzx edx, byte ptr [edx] ; edx = [input_ptr_loc+stack_pointer_loc]
not edx
mov [eax], dl ; [input_ptr_loc+stack_pointer_loc] = NOT [input_ptr_loc+stack_pointer_loc]
jmp loc_401BE3

Opcode 0x5

This opcode is responsible for obfuscation. Length of this instruction is two bytes (opcode, byte to add).
Byte currently pointed by SP is modified with a second byte of the instruction via addition:

obfuscate_stack_add:
movzx eax, [ebp+opcode_ptr]
add eax, 1
mov [ebp+opcode_ptr], al ; opcode_ptr points to byte to add
mov edx, [ebp+input_ptr_loc]
movzx eax, [ebp+stack_pointer_loc]
movzx eax, al
add eax, edx
mov ecx, [ebp+input_ptr_loc]
movzx edx, [ebp+stack_pointer_loc]
movzx edx, dl
add edx, ecx
movzx ecx, byte ptr [edx] ; ecx = [input_ptr_loc+stack_pointer_loc]
mov ebx, [ebp+input_ptr_loc]
movzx edx, [ebp+opcode_ptr]
movzx edx, dl
add edx, ebx
movzx edx, byte ptr [edx] ; edx = [input_ptr_loc+opcode_ptr]
add edx, ecx ; edx = [input_ptr_loc+opcode_ptr] + [input_ptr_loc+stack_pointer_loc]
mov [eax], dl ; [input_ptr_loc+stack_pointer_loc] = [input_ptr_loc+opcode_ptr] + [input_ptr_loc+stack_pointer_loc]
jmp loc_401BE3

Opcode 0x6

This opcode is responsible for obfuscation. Length of this instruction is two bytes (opcode, byte to subtract).
Byte currently pointed by SP is modified with a second byte of the instruction via subtraction:

obfuscate_stack_sub:
movzx eax, [ebp+opcode_ptr]
add eax, 1
mov [ebp+opcode_ptr], al ; opcode_ptr points to byte to subtract
mov edx, [ebp+input_ptr_loc]
movzx eax, [ebp+stack_pointer_loc]
movzx eax, al
add eax, edx
mov ecx, [ebp+input_ptr_loc]
movzx edx, [ebp+stack_pointer_loc]
movzx edx, dl
add edx, ecx
movzx ecx, byte ptr [edx] ; ecx = [input_ptr_loc+stack_pointer_loc]
mov ebx, [ebp+input_ptr_loc]
movzx edx, [ebp+opcode_ptr]
movzx edx, dl
add edx, ebx
movzx edx, byte ptr [edx] ; edx = [input_ptr_loc+opcode_ptr]
sub ecx, edx
mov edx, ecx ; edx = [input_ptr_loc+stack_pointer_loc] - [input_ptr_loc+opcode_ptr]
mov [eax], dl ; [input_ptr_loc+stack_pointer_loc] = [input_ptr_loc+stack_pointer_loc] - [input_ptr_loc+opcode_ptr]
jmp loc_401BE3

Opcode 0x9

This opcode is responsible for debugging. Length of this instruction is one byte (opcode).

Opcode 0xB

This opcode is responsible for printing. Length of this instruction is one byte (opcode).
Byte currently pointed by SP is printed out via putchar():

print_char:
mov edx, [ebp+input_ptr_loc]
movzx eax, [ebp+stack_pointer_loc]
movzx eax, al
add eax, edx ; [input_ptr_loc+stack_pointer_loc]
movzx eax, byte ptr [eax]
movzx eax, al
mov [esp+28h+var_28], eax
call putchar
jmp short loc_401BE3

Bytecode analysis

I will limit myself to my own bytecode. As specified in task #3 it should print my name and it should do so via SMC (self modifying code).

Useful facts:

  • IP moves forward while SP moves backwards
  • We can do underflow on SP
  • We can do arbitrary jumps with loop opcodes
  • We can do arbitrary writes with obfuscation opcodes

My bytecode is doing all that:

Bytecode

Header is marked by a dark grey area. First part of a light grey area marks initial preparation of SP up to the point where it meets with IP (remember SP goes backward and IP goes forward). From this point onwards we are able to modify past bytes with obfuscation opcodes and that is exactly what we are doing inside of the first part of a violet area (decrypting "dyjakan"+0xA string). Second part of a light grey area is for proper padding. Opcode inside of an orange area is responsible for inserting break instruction. Second part of a violet area is responsible for setting up the code for printing. After that there is a green area which is initializing loop opcodes at the beginning of the new code (loop that will rewind our SP). And finally a red area is setting up CL and jumping into new code at offset 0x13.

C:\Users\ad\Desktop\reverseme>EvilGroupVM.exe dyjakan

Header matched. The binary is being loaded to runtime memory
and thread context will be initialized

Instruction pointer initialized to offset 0x13 and value 0x1

Stack pointer initialized to offest 0x90 and value 0x5

Thread context initialized, execution begins

dyjakan

C:\Users\ad\Desktop\reverseme>

For fun and no profit

In conjunction with the above analysis, I have also wrote a simple disassembler for this VM's instruction set.

As a final note, thanks to SensePost and @Zarrasvand for creating this challenge. Double thanks to @Zarrasvand for proofreading.

Why You Should Use Radamsa

During my time at Secunia I’ve seen a lot of fuzzing results published either publicly or privately (via SVCRP). What struck me at the time was that most of them were made via random bit flip. While this approach is certainly the easiest and fastest to implement and execute, there are other ways to mutate data. One of them is to use Radamsa.

They say that a picture is worth a thousand words, hence we will make a comparison between random bit flip and radamsa with images and our eyes as parsers (more scientific, thus correct, approach would be to collect and compare code coverage data).

Below you can see the results of a random bit flip approach on this seed file (1-to-256 changes of 1/2/4 byte(s) size):

Bitflip output

The images are broken in a chaotic fashion. Additionally they all seem to be quite similar.

Now, using radamsa we can get somewhat different set of mutations:

Radamsa output

This time images are less broken and mutations seem to be less chaotic. Also we can observe more structural variations (e.g. re-ordered chunks).

We can clearly spot the differences between these two approaches and so can parsers, hence next time when designing your fuzzing operation you should think about incorporating radamsa as one of your mutation engines.

For grids crafting I’ve used ImageMagick’s montage tool (hence we basically tested how ImageMagick’s parser sees things).