Toward the end of July, FireEye released a set of CTF-style challenges for their second annual FlareOn challenge. These specific challenges are related to reverse engineering and typically showcase obfuscation techniques employed by real pieces of malware. The goal for these challenges is to discover the password/email address that will allow you to advance to the next round. While working on the FlareOn challenges this year, I came across this intriguing write up of a 2014 FlareOn challenge. Since reading that solution, I have wanted to try to solve a CTF problem by using nothing but the instruction count of an execution attempt.
The thought of solving a CTF challenge purely by analyzing a program's instruction count seemed like a much easier solution than staring at assembly code until your head hurts. After the CSAW CTF ended, I decided I would try to use Intel's Pin tool to solve the 500 point reverse engineering question, wyvern. As a side note, this is not how our team managed to solve this challenge during CSAW; rather, one of our members did a lot of reversing (major kudos to Michael F. for that). After finishing my solution, this is definitely one of the first things I will try with similar problems in the future.
For those who were not a part of CSAW, wyvern was a 64-bit ELF that, when ran, would ask for a secret phrase (password) and would then either tell you the input was wrong, or presumably it would present you with the flag.
I began my adventure by first downloading Intel's Pin tool. (You can find a download here.) After playing around a little bit, I got a basic idea of how to use it. Pin can perform many different types of dynamic analysis on various programs, but for my case, I was wanting to use Pin to run a program and analyze how many instructions were executed by that program. My goal was to take the number of instructions executed by wyvern and interpret that in such a way that I could correctly determine the correct length of the phrase wyvern was expecting and then begin intelligently brute-forcing the individual characters of the secret phrase. To those of you who are interested, this is called a side-channel attack.
I started off by trying a single ?z' as input, and I would tack on another ?z' to the end of the input string until one of the executions produced a significant change in the total instruction count. My program claims the average number of instructions executed was 1,411,355, but an input of length 28 causes 1,424,423 instructions to be run. The fact nearly 13,000 more instructions were executed can let us safely assume this is the correct length of the secret phrase. (Because my team had already solved this challenge, I knew this was the correct length, thus solidifying my faith in Pin). At this point, without having to look at any assembly code, I could safely assume the secret phrase was 28 characters long. What I really wanted was to figure out what each character should be by using this same method.
Taking the same concept described above, I began trying to solve for the first character in our secret phrase. In order to speed up my program, I ordered characters based upon the frequency they appear in English and put common leet-speak characters next to their substitutes, much like what was described in the aforementioned blog. I did make a slight change and put the ?z' at the beginning and end of my list so that it could be used as a baseline for the number of instructions executed. (I figured it would be unlikely that a ?z' would occur, and even if it did, the average would have leveled out, and the ?z' at the end of the list should differ enough from the average to be caught by my program.)
My algorithm roughly went as follows. I took my answer string and appended the next character in my frequency list to it. I then filled the rest of the string with a's until I had a string of length 28. I then ran this through Pin and saw if the number of instructions executed was significantly higher than my baseline. If it was, I assumed that was the correct character and began looking for the next character. If the instruction count was not larger than then baseline, I calculated the average number of instructions executed and used that in my following tests.
When I finished scripting this in Python, I was left with a program I feel can be easily modified to solve similar challenges in the future. During CSAW, several hours were spent analyzing and solving the program wyvern, but this relatively simple script will determine the correct argument length and find the correct key in approximately 16-20 minutes. I thought this was a really intriguing example of how someone could determine so much about a program by only looking at the number of instructions executed. I'm also not complaining about staring at less assembly!