When starting a reverse engineering class I usually ask:
- How many of you have ever done any reverse engineering?
And usually, all the hands in the room are raised, so I continue and ask:
-
So, if I give you a simple binary you’ll be able to tell me which
compiler I used within ten minutes just by looking at the disassembly,
not using any tools?
As
an answer, nine out of ten hands drop. When I ask the remaining hand
whether or not he will be able to take a simple binary and give me back a
C file that, when compiled, matches the original binary bit for bit,
he’ll retreat too.
I
have been thinking about why this happens. Thinking back to myself, I
started learning reverse engineering by reading the Lena151 tutorials. I
thought they were awesome until Daeken told me that was an awful approach to learn reverse engineering.
At
first I didn’t understand why they were so bad. After all, Lena’s
tutorials had taught me how to crack my first software. But since Daeken
is such an experienced reverse engineer, I took his advice without
question and started writing C programs that I reverse-engineered
statically instead.
A
while back, a friend of mine who wants to get going on reverse
engineering told me he had downloaded the Lena151 tutorials and was
about to get started. I told him he shouldn’t and gave him the exact
same advice I got from Daeken some years back, because I really agree
with him. But I couldn’t explain to my friend why he should do as I
said.
I’ve
come up with an analogy that I think anyone getting into reverse
engineering should read and understand. If you do this and follow the
advice, you won’t be one of those raised hands who will drop immediately
when the follow-up question is asked. Your hand will stay raised.
The analogy
Pretend
I give you a car without a brand and tell you to reverse engineer it
for me. One way you could do this is by sitting in the car and press
different buttons while documenting what happens. You could take a ride
with it, maybe tank it full and document how far you get, how fast it
can go and how good the brakes are. You open the front or the back and
examine changes while you press buttons or pedals. You crawl under it
and repeat what you’ve done to document and examine it from another
angle.
I
bet, in the end of the day, you’ll have a pretty good grasp of the
limits of that car. You could probably even write a specification for
somebody that would like to build a car with the same limits and
benefits.
So, have you reverse engineered it? Hell no!
-
Let’s dig deeper, you tell yourself. You find a manual for the car and
read it. Then you google for the engine model and learn some basic
electronics. Great, you now know how to hot-wire the car. Finally you
google the lock type and find out how to break it. Voila, you know
exactly how to steal the car.
And
yet, you still haven’t reverse engineered anything. You’ve done simple
behavior analysis and behavior modification at most. This is what the
Lena151 tutorials are all about, behavior analysis and behavior
modifying, or as we hackers call it “cracking”.
What
actually needs to be done to reverse engineer the car is that you need
to be able to tell me how the electronics in it work and how they are
put together. You need to be able to tell me why they are put together
the way they are and how a change on the system will impact the entire
system. You need to be able to tell me which fabric or brand the engine
is from by analyzing the electronic in it. When you have enough
information to tell me how to build a fabric that will squeeze out cars
like that, with the exact same engine and the exact same electronics,
you’ve actually reverse engineered the car.
Translated to the world of software, this means you need to understand the following:
x86 assembly (electronics and wires in the car analogy)
How operating systems work and how they manage memory (the engine of the car)
The compiling process from C-code to assembly (this is equivalent to knowing how a car-fabric is assembling a car)
The life of a binary (equivalent to everything that happens in the car from the key-switch to the off-switch)
Then
you’ll need to know the file format of binaries in your targeted
system, but that is minor knowledge that you’ll be able to pick up
quickly once you know the rest.
See
the difference between reversing and cracking? While one of them is
about understanding the features of the target the other one is about
re-creating the process that created the target.
This
is why Daeken’s advice works. When compiling your own C-code and
examining it in IDA Pro you take your first steps to learn how compilers
create binaries and why. The Lena151 tutorials have not a single line
about compiler technology or how operating systems manage memory.
Most
people, whether they learned “reverse engineering” from Lena’s
tutorials or by their own, are confusing reverse engineering with
behavior analysis and behavior modification. The later ones are great
skills to have and very essential for software professionals. They just
are not the same as reverse engineering. I think that is the reason why
so many raised hands drop when asked to reverse engineer a simple
binary.