lördag 20 oktober 2012

Why the Lena151 tutorials wont teach you reverse engineering - and what you should do instead

When starting a reverse engineering class I usually ask:
- How many of you have ever done any reverse engineering?

And usually, all the hands in the room are raised, so I continue and ask:
- So, if I give you a simple binary you’ll be able to tell me which compiler I used within ten minutes just by looking at the disassembly, not using any tools?

As an answer, nine out of ten hands drop. When I ask the remaining hand whether or not he will be able to take a simple binary and give me back a C file that, when compiled, matches the original binary bit for bit, he’ll retreat too.

I have been thinking about why this happens. Thinking back to myself, I started learning reverse engineering by reading the Lena151 tutorials. I thought they were awesome until Daeken told me that was an awful approach to learn reverse engineering.

At first I didn’t understand why they were so bad. After all, Lena’s tutorials had taught me how to crack my first software. But since Daeken is such an experienced reverse engineer, I took his advice without question and started writing C programs that I reverse-engineered statically instead.

A while back, a friend of mine who wants to get going on reverse engineering told me he had downloaded the Lena151 tutorials and was about to get started. I told him he shouldn’t and gave him the exact same advice I got from Daeken some years back, because I really agree with him. But I couldn’t explain to my friend why he should do as I said.

I’ve come up with an analogy that I think anyone getting into reverse engineering should read and understand. If you do this and follow the advice, you won’t be one of those raised hands who will drop immediately when the follow-up question is asked. Your hand will stay raised.

The analogy

Pretend I give you a car without a brand and tell you to reverse engineer it for me. One way you could do this is by sitting in the car and press different buttons while documenting what happens. You could take a ride with it, maybe tank it full and document how far you get, how fast it can go and how good the brakes are. You open the front or the back and examine changes while you press buttons or pedals. You crawl under it and repeat what you’ve done to document and examine it from another angle.
I bet, in the end of the day, you’ll have a pretty good grasp of the limits of that car. You could probably even write a specification for somebody that would like to build a car with the same limits and benefits.

So, have you reverse engineered it? Hell no!

- Let’s dig deeper, you tell yourself. You find a manual for the car and read it. Then you google for the engine model and learn some basic electronics. Great, you now know how to hot-wire the car. Finally you google the lock type and find out how to break it. Voila, you know exactly how to steal the car.

And yet, you still haven’t reverse engineered anything. You’ve done simple behavior analysis and behavior modification at most. This is what the Lena151 tutorials are all about, behavior analysis and behavior modifying, or as we hackers call it “cracking”.

What actually needs to be done to reverse engineer the car is that you need to be able to tell me how the electronics in it work and how they are put together. You need to be able to tell me why they are put together the way they are and how a change on the system will impact the entire system. You need to be able to tell me which fabric or brand the engine is from by analyzing the electronic in it. When you have enough information to tell me how to build a fabric that will squeeze out cars like that, with the exact same engine and the exact same electronics, you’ve actually reverse engineered the car.

Translated to the world of software, this means you need to understand the following:

x86 assembly (electronics and wires in the car analogy)
How operating systems work and how they manage memory (the engine of the car)
The compiling process from C-code to assembly (this is equivalent to knowing how a car-fabric is assembling a car)
The life of a binary (equivalent to everything that happens in the car from the key-switch to the off-switch)

Then you’ll need to know the file format of binaries in your targeted system, but that is minor knowledge that you’ll be able to pick up quickly once you know the rest.

See the difference between reversing and cracking? While one of them is about understanding the features of the target the other one is about re-creating the process that created the target.

This is why Daeken’s advice works. When compiling your own C-code and examining it in IDA Pro you take your first steps to learn how compilers create binaries and why. The Lena151 tutorials have not a single line about compiler technology or how operating systems manage memory.

Most people, whether they learned “reverse engineering” from Lena’s tutorials or by their own, are confusing reverse engineering with behavior analysis and behavior modification. The later ones are great skills to have and very essential for software professionals. They just are not the same as reverse engineering. I think that is the reason why so many raised hands drop when asked to reverse engineer a simple binary.