18.08.2015

Knowledge Fragment: Fobber Inline String Decryption

In the other blog post on Fobber, I have demonstrated how to batch decrypt function code, which left us with IDA recognizing a fair amount of code opposed to only a handful of functions:

After function decryption, IDA recognizes some code already.
However, we can see that there is still a lot of "red" remaining, meaning that functions have not been recognized as nicely as we would like it.

The reason for this is that Fobber uses another technique which we might call "inline string decryption".
It looks like this:

Fobber inline calls at 0x950326 and 0x950345, pushing the crypted string address to the stack which is then consumed as argument by decryptString()
We can see two calls to decryptString(), and both of them are preceded by a collection of bytes prior to which a call happens that seemingly "jumps" over them.
The effect of a call is that it pushes its return address to the stack - in our case resulting in the absolute address of the encrypted string directly following this call being pushed to the stack. From a coder's perspective, this "calling over the encrypted string" is an elegant way to save a couple bytes, while from an analysts perspective, this really screws with IDA. :)

Let's look at how the strings are decrypted:

Fobber's string decryption function.
Again, the rather simple single-byte xor-loop jumps the eye.

However, the interesting part is how parameters are loaded.

Thus, let me explain the effects of instructions one-by-one:

[...]
mov   edi, [ebp+0Ch]       | move pointer where decrypted string will be put to EDI
mov   esi, [ebp+8]         | move pointer of encrypted string to ESI
lodsw                      | load two bytes (word) from ESI
movzx ecx, al              | put the lower byte into ecx and zero extend -> this is our len
lodsb                      | load another byte from ESI (first byte of encrypted string)
xor   al, ah               | xor string byte with upper byte loaded by lodsw (our key)
xor   al, cl               | xor string byte with number of remaining chars (CL)
stosb                      | store decrypted byte
loop  loc_953878           | decrement ECX and repeat as long as >0.
[...]


 Running this as exmaple for the first encrypted string as shown in the first picture:

                        crypted string len
                        |  key
                        |  |
remaining len           |  |  07 06 05 04 03 02 01            
                        07 B2 C0 C6 DB DB DE DE B3
xor key (B2)            -- -- 72 74 69 69 6c 6c 01
xor remaining len       -- -- 75 72 6c 6d 6f 6e 00
ASCII                   -- --  u  r  l  m  o  n --


So the first decrypted string here resolves nicely to "urlmon".
Let's automate for all strings again.


Decrypt All The Strings


First, we locate the string decryption function.
This time we can use regex r"\xE8....\x55\x89\xe5\x60.{8,16}\x30.\x30" which again gives a unique hit.

This time, we first locate all calls to this function, like in the post on function decryption. For this we can use the regex r"\xE8" again to find all potential "call rel_offset" instructions.
We apply the same address math and check if the call destination (calculated as: image_base + call_origin + relative_call_offset + 5) is equal to the address of our string decryption function.
In this case, we can store the call_origin as a candidate for string decryption.

Next, we run again over all calls and check if a call to one of these string decryption candidates happens - this is very likely one of the "calling over encrypted strings" locations as explained earlier. This could probably have been solved differently but it worked for me.
Next we extract and decrypt the string, then patch it again in the binary.
I also change the "call over encrypted string" to a jump (first byte 0xE8->0xE9) because IDA likes this more and will not create wrongly detected functions later on.

Code:

#!/usr/bin/env python
import re
import struct

def decrypt_string(crypted_string):
    decrypted = ""
    size = ord(crypted_string[0])
    key = ord(crypted_string[1])
    remaining_chars = len(crypted_string[2:])
    index = 0
    while remaining_chars > 0:
        decrypted += chr(ord(crypted_string[2 + index]) ^ remaining_chars ^ key)
        remaining_chars -= 1
        index += 1
    return decrypted + "\x00\x00"

def replace_bytes(buf, offset, bytes):
    return buf[:offset] + bytes + buf[offset + len(bytes):]

def decrypt_all_strings(binary, image_base):
    # locate decryption function
    decrypt_string_offset = re.search(r"\xE8....\x55\x89\xe5\x60.{8,16}\x30.\x30", binary).start()

    # locate calls to decryption function
    regex_call = r"\xe8"
    calls_to_decrypt_string = []
    for match in re.finditer(regex_call, binary):
        call_origin = match.start()
        packed_call = binary[call_origin + 1:call_origin + 1 + 4]
        rel_call = struct.unpack("I", packed_call)[0]
        call_destination = (image_base + call_origin + rel_call + 5) & 0xFFFFFFFF
        if call_destination == image_base + decrypt_string_offset:
            calls_to_decrypt_string.append(image_base + call_origin)

    # identify calls to these string decryption candidates
    for match in re.finditer(regex_call, binary):
        call_origin = match.start()
        packed_call = binary[call_origin + 1:call_origin + 1 + 4]
        rel_call = struct.unpack("I", packed_call)[0]
        call_destination = (image_base + call_origin + rel_call + 5) & 0xFFFFFFFF

        if call_destination in calls_to_decrypt_string:

            # decrypt string and fix in the binary
            crypted_string = binary[call_origin + 0x5:call_destination -  image_base]
            decrypted_string = decrypt_string(crypted_string)
            binary = replace_bytes(binary, call_origin, "\xE9")
            binary = replace_bytes(binary, call_origin + 0x5, decrypted_string)
            print "0x%x: %s" % (image_base + call_origin, decrypted_string)

     return binary

[...]




Load in IDA and see the result:
Decryption of all "call over encrypted string" locations.
Yay.



Conclusion

This blog post detailed how Fobber uses encrypted inline strings, which sadly is also a big deal to IDA, misclassifying a bunch of calls.

sample used:
  md5: 49974f869f8f5d32620685bc1818c957
  sha256: 93508580e84d3291f55a1f2cb15f27666238add9831fd20736a3c5e6a73a2cb4

Repository with memdump + extraction code


Knowledge Fragment: Unwrapping Fobber

About two weeks ago I came across an interesting sample using an interesting anti-analysis pattern.
The anti-analysis technique can be best described as "runtime-only code decryption". This means prior to execution of a function, the code is decrypted, then executed and finally encrypted again, but with a different key.
Malwarebytes has already published an analysis on this family they called "Fobber".
However, in this blog post I wanted to share how to "unwrap" the sample's encrypted functions for easier analysis. There is also another blog post detailing how to work around the string encryption.

The sample and code related to this blog post can be found on bitbucket.

Fobber's Function Encryption Scheme


First off, let's have a look how Fobber looks in memory, visualized by IDA's function analysis:

IDA's first view on Fobber.


IDA only recognizes a handful of functions. Among these is the actual code decryption routine, as well as some code handling translating relevant addresses of the position independent code into absolute offsets.

Next, a closer look at how the on-demand decryption/encryption of functions works:

The Fobber-encrypted function sub_95112A, starting with call to decryptFunctionCode.

We can see that function sub_95112A starts with a call to what I renamed "decryptFunctionCode":

Fobber's on-demand decryption code for functions, revealing the parameter offsets neccessary for decryption.


This function does not make use of the stack, thus it is surrounded by a simple pushad/popad prologue/epilogue. We can see that some references are made relative to the return address (initially put into esi by copying from [esp+20h]):
  • Field [esi-7] contains a flag indicating whether or not the function is already decrypted. 
  • Field [esi-8h] contains the single byte key for encryption, while 
  • field [esi-Ah] contains the length of the encrypted function, stored xor'ed with 0x461F.

The actual cryptXorCode takes those values as parameters and then loops over the encrypted function body, xor'ing with the current key and then updating the key by rotating 3bit and adding 0x53.

Function for decrypting one function, given the neccessary parameters.


After decryption, our function makes a lot more sense and we can see the default function prologue (push ebp; mov ebp, esp) among other things.

The decrypted equivalent of function sub_95112A, revealing some "real" code.


Also note the parameters:
  • 0x951125 - key: 0x7B
  • 0x951126 - length: 0x4629^0x461F -> 0x36 bytes
  • 0x951128 - encryption flag: 0x01

So far so good. Now let's decrypt all of those functions automatically.

Decrypt All The Things


First, we want to find our decryption function. For all Fobber samples I looked at, the regex r"\x60\x8B.\x24\x20\x66" was delivering unique results for locating the decryption function.

Next, we want to find all calls to this decryption function. For this we can use the regex r"\xE8" to find all potential "call rel_offset" instructions.
Then we just need to do some address math and check if the call destination (calculated as: image_base + call_origin + relative_call_offset + 5) is equal to the address of our decryption function.
Should this be the case, we can extract the parameters as described above and decrypt the code.

We then only need to exchange the respective bytes in our binary with the decrypted bytes. In the following code I also set the decryption flag and fix the function ending with a "retn" (0xC3) instruction to ease IDA's job of identifying functions afterwards. Otherwise, rinse/repeat until all functions are decrypted.

Code:

#!/usr/bin/env python
import re
import struct

def decrypt(buf, key):
    decrypted = ""
    for char in buf:
        decrypted += chr(ord(char) ^ key)
        # rotate 3 bits
        key = ((key >> 3) | (key << (8 - 3))) & 0xFF
        key = (key + 0x53) & 0xFF
    return decrypted

def replace_bytes(buf, offset, bytes):
    return buf[:offset] + bytes + buf[offset + len(bytes):]

def decrypt_all(binary, image_base):

    # locate decryption function
    decrypt_function_offset = re.search(r"\x60\x8B.\x24\x20\x66", binary).start()

    # locate all calls to decryption function

    regex_call = r"\xe8(?P<rel_call>.{4})"
    for match in re.finditer(regex_call, binary):
        call_origin = match.start()
        packed_call = binary[call_origin + 1:call_origin + 1 + 4]
        rel_call = struct.unpack("I", packed_call)[0]
        call_destination = (image_base + call_origin + rel_call + 5) & 0xFFFFFFFF
        if call_destination == image_base + decrypt_function_offset:

            # decrypt function and replace/fix

            decrypted_flag = ord(binary[call_origin - 0x2])
            if decrypted_flag == 0x0:
                key = ord(binary[call_origin - 0x3])
                size = struct.unpack("H", binary[call_origin - 0x5:call_origin - 0x3])[0] ^ 0x461F
                buf = binary[call_origin + 0x5:call_origin + 0x5 + size]
                decrypted_function = decrypt(buf, key)
                binary = replace_bytes(binary, call_origin + 0x5, decrypted_function)
                binary = replace_bytes(binary, call_origin + len(decrypted_function), "\xC3")
                binary = replace_bytes(binary, call_origin - 0x2, "\x01")

    return binary


[...]

IDA likes this this already better:

IDA's view on a code-decrypted Fobber sample.


However, we are not quite done yet, as IDA still barfs on a couple of functions.

Conclusion

After decrypting all functions, we can already start analyzing the sample effectively.
But we are not quite done yet, and the second post looks closer at the inline usage of encrypted strings.


sample used:
  md5: 49974f869f8f5d32620685bc1818c957
  sha256: 93508580e84d3291f55a1f2cb15f27666238add9831fd20736a3c5e6a73a2cb4

Repository with memdump + extraction code