SensePost Reversing Challenge Analysis

Couple of weeks ago I have stumbled upon a reversing challenge made by SensePost earlier this year. I was not able to find any public solutions to it so I thought that it would be interesting to make one. Recently I had some spare time on my hands so I have decided to follow this goal. Enjoy.

Initial analysis

After firing up IDA we can quickly pinpoint main() function which is located at 0x004016B0 (with base address as 0x00401000). This binary is not obfuscated nor protected in any way, hence we can immediately spot interesting calls such as fopen(), fgetc(), fread(), and strncmp(). We will follow with top-down style analysis.

First thing that we see in main() is fopen() call which is checked for success at 0x004016DF; if file opening operation was successful application proceeds with reading individual characters via fgetc() in a loop until position 0xFF is passed or EOF is reached. These two checks look like this:

mov eax, [esp+30h+fd]
mov [esp+30h+input_ptr_loc], eax
call fgetc
cmp eax, 0FFFFFFFFh ; (fgetc() == EOF)?
jz short loc_4

cmp [esp+30h+counter], 0FFh
jle short loc_4016F7

However, this loop is redundant. It does not affect any data that is later used. Next code block looks more interesting:

mov eax, [esp+30h+fd]
mov [esp+30h+var_24], eax ; src
mov [esp+30h+var_28], 100h ; number of elements
mov [esp+30h+tmp_ptr], 1 ; size of an element
mov [esp+30h+input_ptr_loc], offset input_ptr ; dst
call fread

We can clearly see that this code reads up 0x100 bytes from the file via fread() (this limits the size for a bytecode), then it starts preparing headers for comparison via strncmp() at 0x0040178D.
After a quick study of surrounding code we can conclude that the header is 0x10 bytes long and it looks like this:

65 67 76 6D 62 69 6E 61 72 79 00 00 00 00 00 00

Due to the inner workings of strncmp() the header is validated only up to the first NULL byte (man strncmp).

When the header is successfully validated following block is hit:

mov [esp+30h+input_ptr_loc], offset aHeaderMatched_
call puts
call loader
mov [esp+30h+input_ptr_loc], eax
mov [esp+30h+tmp_ptr], edx
call executor
leave
retn

I have already re-named the two most important functions in this binary and their names are self-explanatory.

loader() function

This function starts at 0x004017CD and interesting things are right in the beginning:

mov eax, [ebp+input_ptr_loc]
add eax, 10h
movzx eax, byte ptr [eax]
cmp al, 10h
jbe short loc_40182D ; JMP IF ([input_ptr_loc+0x10] <= 0x10)

It loads a value from a bytecode at offset 0x10 and then checks if it is below or equal to 0x10. As can be seen in a jump to True node, this happens to be IP (instruction pointer). However, I would call it EP (entry point) as it is more precise. So, EP is at offset 0x10 in a bytecode and upon initialization it cannot be less than nor equal to 0x10.
Next node does couple of things but the most important part is this:

mov eax, [ebp+input_ptr_loc]
add eax, 11h
movzx eax, byte ptr [eax]
cmp al, 4Fh
jbe short loc_40187E ; JMP IF ([input_ptr_loc+0x11] <= 0x4F)

Similar mechanism and again when inspecting jump to True node we can find out what it is, namely SP (stack pointer). Hence, SP value is at offset 0x11 in a bytecode and upon initialization it cannot be less than nor equal to 0x4F.

Once the bytecode was validated and loaded, the user has the ability to change both IP and SP to arbitrary values via some trickery.

Code block at 0x00401845 does nothing interesting (it prints out some information about the stack (SP and first byte)).
Final block of this function looks like this:

mov eax, [ebp+input_ptr_loc]
movzx eax, byte ptr [eax+12h] ; [input_ptr_loc+0x12] happens to be LC (loop counter)
mov byte ptr [ebp+tmp_ptr+2], al
mov byte ptr [ebp+tmp_ptr+3], 0
mov [esp+28h+string], offset aThreadContextI ; "\n\tThread context initialized, execution"...
call puts
mov eax, [ebp+input_ptr_loc] ; eax holds input_ptr_loc
mov edx, [ebp+tmp_ptr] ; edx holds DWORD with \x00+LC+SP+IP
leave
retn

Apart from printing out thread context information it does two important things. First, it loads one additional value from a bytecode at offset 0x12; this field was recognized later as LC (loop counter) and its existence is optional (required when a bytecode is using loops; can be crafted at runtime and SMC example takes advantage of that). Second, when returning it loads pointer to a bytecode in memory into EAX and it loads IP, SP, LC, 0x00 as DWORD value into EDX.

All internal values (IP/SP/LC) are BYTE long and a bytecode is at most 0x100 in size. This makes sense, even if more data would be loaded the VM is limited with the registers size.

executor() function

This function starts at 0x004018B8. However, first things first:

mov [esp+30h+input_ptr_loc], eax
mov [esp+30h+tmp_ptr], edx
call executor

That is why EAX and EDX were important when returning from loader() as both of them are used in executor() (passed as arguments).

Now, I will not go through each block of executor() function as most of them are easy enough to follow. I will focus on enumerating opcodes and describing associated code.

First interesting block is at the beginning of an opcode fetching loop and it looks like this:

mov edx, [ebp+input_ptr_loc]
movzx eax, [ebp+opcode_ptr]
movzx eax, al
add eax, edx
movzx eax, byte ptr [eax] ; eax = [input_ptr_loc+opcode_ptr]
test al, al
jnz loc_4018C4

Opcode is being fetched from a location pointed by input_ptr_loc+opcode_ptr (input_ptr_loc is always equal to static address and opcode_ptr is equal to the EP on first iteration) and it is loaded into EAX and checked against 0x0 (which by itself happens to be our first valid opcode). From there you should be able to enumerate all opcodes by just following the conditions. They are as follows:

0x0 opcode = break/exit execution
0x1 opcode = stack pointer decrement
0x2 opcode = initialize loop (set loop_flag for 0x3 opcode and decrement LC)
0x3 opcode = takes care of jumping for freshly initialized loop
0x4 opcode = obfuscate current byte on stack with inversed current byte (byte on stack = NOT(byte))
0x5 opcode = obfuscate current byte on stack with opcode_ptr+1 (next byte to currently processed opcode), addition method
0x6 opcode = obfuscate current byte on stack with opcode_ptr+1 (next byte to currently processed opcode), substraction method
0xB opcode = print char
0x9 opcode = debugging flag

After each opcode (apart from opcodes 0x0 and 0x9) IP is incremented by the following code:

movzx eax, [ebp+opcode_ptr]
add eax, 1
mov [ebp+opcode_ptr], al

Instruction set details

Opcode 0x0

This opcode is responsible for termination. Length of this instruction is one byte (opcode).

mov [esp+28h+var_28], 0Ah
call putchar
add esp, 24h
pop ebx
pop ebp
retn

Opcode 0x1

This opcode is responsible for decrementing SP. Length of this instruction is one byte (opcode).

stack_pointer_decrement:
movzx eax, [ebp+stack_pointer_loc]
sub eax, 1
mov [ebp+stack_pointer_loc], al
jmp loc_401BE3

SP is decremented hence the stack goes backward. This is important notion of this VM.

Opcode 0x2

This opcode is responsible for loop initializing. Length of this instruction is one byte (opcode).
There is an additional check made before initialization which verifies whether LC is set or not:

mov eax, [ebp+input_ptr_loc]
movzx eax, byte ptr [eax+12h]
mov [ebp+loop_counter], al
movzx eax, [ebp+loop_counter]
test al, al
jz short reset_loop_flag

If LC is set then it is decremented and loop_flag is set:

loop_init:
movzx eax, [ebp+loop_counter]
sub eax, 1
mov [ebp+loop_counter], al
mov eax, [ebp+input_ptr_loc]
lea edx, [eax+12h]
movzx eax, [ebp+loop_counter]
mov [edx], al ; [input_ptr_loc+0x12] = loop_counter-1
mov [ebp+loop_flag], 1 ; set loop_flag
jmp loc_401BE3

If LC is not set then loop_flag is NULLed and execution continues.

Opcode 0x3

This opcode is responsible for jumping. Length of this instruction is two bytes (opcode, new IP).
Because of an additional check it cannot be used as a stand-alone jump instruction (loop_flag needs to be set):

movzx eax, [ebp+loop_flag]
test al, al
jz short skip_next_opcode

If loop_flag is set then new IP is initialized:

set_opcode_ptr:
mov eax, [ebp+input_ptr_loc]
movzx edx, [ebp+opcode_ptr]
movzx edx, dl
add edx, 1
add eax, edx
movzx eax, byte ptr [eax] ; eax = [input_ptr_loc+(opcode_ptr+1)]
sub eax, 1
mov [ebp+opcode_ptr], al ; opcode_ptr = [input_ptr_loc+(opcode_ptr+1)]-1
jmp loc_401BE3

If loop_flag is not set it skips second byte with an additional IP increment:

skip_next_opcode:
movzx eax, [ebp+opcode_ptr]
add eax, 1
mov [ebp+opcode_ptr], al
jmp loc_401BE3

Opcodes 0x2 and 0x3 are mutually connected. One cannot be used without the other.

Opcode 0x4

This opcode is responsible for obfuscation. Length of this instruction is one byte (opcode).
Byte currently pointed by SP is inverted:

obfuscate_stack_byte:
mov edx, [ebp+input_ptr_loc]
movzx eax, [ebp+stack_pointer_loc]
movzx eax, al
add eax, edx
mov ecx, [ebp+input_ptr_loc]
movzx edx, [ebp+stack_pointer_loc]
movzx edx, dl
add edx, ecx
movzx edx, byte ptr [edx] ; edx = [input_ptr_loc+stack_pointer_loc]
not edx
mov [eax], dl ; [input_ptr_loc+stack_pointer_loc] = NOT [input_ptr_loc+stack_pointer_loc]
jmp loc_401BE3

Opcode 0x5

This opcode is responsible for obfuscation. Length of this instruction is two bytes (opcode, byte to add).
Byte currently pointed by SP is modified with a second byte of the instruction via addition:

obfuscate_stack_add:
movzx eax, [ebp+opcode_ptr]
add eax, 1
mov [ebp+opcode_ptr], al ; opcode_ptr points to byte to add
mov edx, [ebp+input_ptr_loc]
movzx eax, [ebp+stack_pointer_loc]
movzx eax, al
add eax, edx
mov ecx, [ebp+input_ptr_loc]
movzx edx, [ebp+stack_pointer_loc]
movzx edx, dl
add edx, ecx
movzx ecx, byte ptr [edx] ; ecx = [input_ptr_loc+stack_pointer_loc]
mov ebx, [ebp+input_ptr_loc]
movzx edx, [ebp+opcode_ptr]
movzx edx, dl
add edx, ebx
movzx edx, byte ptr [edx] ; edx = [input_ptr_loc+opcode_ptr]
add edx, ecx ; edx = [input_ptr_loc+opcode_ptr] + [input_ptr_loc+stack_pointer_loc]
mov [eax], dl ; [input_ptr_loc+stack_pointer_loc] = [input_ptr_loc+opcode_ptr] + [input_ptr_loc+stack_pointer_loc]
jmp loc_401BE3

Opcode 0x6

This opcode is responsible for obfuscation. Length of this instruction is two bytes (opcode, byte to subtract).
Byte currently pointed by SP is modified with a second byte of the instruction via subtraction:

obfuscate_stack_sub:
movzx eax, [ebp+opcode_ptr]
add eax, 1
mov [ebp+opcode_ptr], al ; opcode_ptr points to byte to subtract
mov edx, [ebp+input_ptr_loc]
movzx eax, [ebp+stack_pointer_loc]
movzx eax, al
add eax, edx
mov ecx, [ebp+input_ptr_loc]
movzx edx, [ebp+stack_pointer_loc]
movzx edx, dl
add edx, ecx
movzx ecx, byte ptr [edx] ; ecx = [input_ptr_loc+stack_pointer_loc]
mov ebx, [ebp+input_ptr_loc]
movzx edx, [ebp+opcode_ptr]
movzx edx, dl
add edx, ebx
movzx edx, byte ptr [edx] ; edx = [input_ptr_loc+opcode_ptr]
sub ecx, edx
mov edx, ecx ; edx = [input_ptr_loc+stack_pointer_loc] - [input_ptr_loc+opcode_ptr]
mov [eax], dl ; [input_ptr_loc+stack_pointer_loc] = [input_ptr_loc+stack_pointer_loc] - [input_ptr_loc+opcode_ptr]
jmp loc_401BE3

Opcode 0x9

This opcode is responsible for debugging. Length of this instruction is one byte (opcode).

Opcode 0xB

This opcode is responsible for printing. Length of this instruction is one byte (opcode).
Byte currently pointed by SP is printed out via putchar():

print_char:
mov edx, [ebp+input_ptr_loc]
movzx eax, [ebp+stack_pointer_loc]
movzx eax, al
add eax, edx ; [input_ptr_loc+stack_pointer_loc]
movzx eax, byte ptr [eax]
movzx eax, al
mov [esp+28h+var_28], eax
call putchar
jmp short loc_401BE3

Bytecode analysis

I will limit myself to my own bytecode. As specified in task #3 it should print my name and it should do so via SMC (self modifying code).

Useful facts:

  • IP moves forward while SP moves backwards
  • We can do underflow on SP
  • We can do arbitrary jumps with loop opcodes
  • We can do arbitrary writes with obfuscation opcodes

My bytecode is doing all that:

Bytecode

Header is marked by a dark grey area. First part of a light grey area marks initial preparation of SP up to the point where it meets with IP (remember SP goes backward and IP goes forward). From this point onwards we are able to modify past bytes with obfuscation opcodes and that is exactly what we are doing inside of the first part of a violet area (decrypting "dyjakan"+0xA string). Second part of a light grey area is for proper padding. Opcode inside of an orange area is responsible for inserting break instruction. Second part of a violet area is responsible for setting up the code for printing. After that there is a green area which is initializing loop opcodes at the beginning of the new code (loop that will rewind our SP). And finally a red area is setting up CL and jumping into new code at offset 0x13.

C:\Users\ad\Desktop\reverseme>EvilGroupVM.exe dyjakan

Header matched. The binary is being loaded to runtime memory
and thread context will be initialized

Instruction pointer initialized to offset 0x13 and value 0x1

Stack pointer initialized to offest 0x90 and value 0x5

Thread context initialized, execution begins

dyjakan

C:\Users\ad\Desktop\reverseme>

For fun and no profit

In conjunction with the above analysis, I have also wrote a simple disassembler for this VM's instruction set.

As a final note, thanks to SensePost and @Zarrasvand for creating this challenge. Double thanks to @Zarrasvand for proofreading.