Via Big Mess O’ Wires.
Want to know more about how assembly code works, and how Windows executable programs are put together? I thought it would be fun to write a “hello world” program in C, and then examine it with some common reversing tools, to get a better understanding of what’s happening under the hood. To keep things interesting, the example program generates a simple secret code from a name that’s entered, instead of being a true “hello world” that only prints a fixed message. Follow along with me, and we’ll look at the disassembled program listing to reverse engineer the secret code algorithm, just like super 1337 haxors!
The example program for Windows runs in a console window, and is a 32-bit text-only application. It was written in plain C, and compiled in release mode with Microsoft Visual Studio Express 2012 for Windows. The C runtime library was Microsoft’s multi-threaded DLL version. In an effort to produce assembled code that was small and easy to understand, I turned off all the advanced compiler and linker options that I could.
Instead of “hello world”, I should have called the example “hello bloat”, because the 18 lines of C code resulted in a 6144 byte executable program. Huh? If you estimate that each line of C code might compile into 3 or so CPU instructions, each of which is an average of 4 bytes, then you might have expected a total executable size of about 200 bytes. If you predicted that there’s also some type of executable header, and maybe some extra code to handle interfacing with the C runtime DLL, and things like string literals and other constants, then you might have expected a total size of 400 or 500 bytes, but 6144 is hard to explain. Let’s look at what fills all those bytes later, and start by examining the heart of the program where the secret code algorithm lies.