My great thanks to Volodya for helping me to translate this article to English
It is well known: to call himself a programmer between some friends one has to write (and this is sufficient enough :) a program that outputs somehow "Hello, World!" on the screen. Now it is fairly simple to do using Windows. Open an old good Notepad and enter 'MsgBox "Hello, World!", save it as a file with the ".vbs" extension (e.g., 'Hello.vbs'), then launch the file with double click. Those having installed Word of MS Office XP can use more sophisticated variant:
Set w = CreateObject("Word.Application")
w.Visible = True
Set rng = w.Documents.Add.Range(0,0)
.InsertBefore "Hello, World!"
.ParagraphFormat.Alignment = 1
.Name = "Arial"
.Size = 48
.Italic = True
.Color = 200
It has little concern to the operating system itself though. Is it possible to create a real Windows application without the development tools on a very common computer?
Having started up newly installed Windows XP Pro and having entered 'debug' in command line I was highly surprised with hyphen in console window - familiar prompt of the old DOS debugger. This curious thing is the tool we need. Besides we will need some knowledge about PE files layout and how to load them into memory.
Win32 executables utilize PE file layout. As the old DOS EXE, the PE file consists of the header and the image of a program to run. The program image is composed of one or more objects or sections sometimes misnamed as segments. They do not relate to the old segmentation model nor do they relate to objects in a sense objects are used in programming languages. So it is better to call parts of the program image "sections".
The division into sections exists to optimize the memory management on Windows. That's why the loaded section's sizes have to be the multiples of the memory page size (usually 4 Kb) and must be properly aligned. The size of the sections written onto disk must be aligned according to the size of the "file pages" which is a multiple of the disk sector (512 bytes) - this is also aimed at the optimization of loading.
The program image is being loaded into memory starting at some base load address which is indicated in the file header and must be aligned on 64 Kb boundary. The common base load address for EXEs is 400000h.
To be more concrete lets design our PE file right away. We will take the minimal possible alignment for the sections - 4096 (1000h) bytes - and for the file - 512 (200h) bytes. Our program's image will consist of only one section that contains a data, a code and some housekeeping import tables. Thus, the file size will fit in 1 Kb and the memory image of the program will be 8 Kb (2 pages). The first page at the load address 400000h will contain the header of the PE file and there will be the first (and the only one) section of our file at offset 1000h.
So, lets create a framework. Enter "debug" in command line. Clear the first 1024 (400h - debug uses hexes) bytes of memory by filling it with zeros:
f 0 400 0
Here we will "assemble" our application. Lets use one more temporary memory location at offset 1000h (to not confuse with addresses while entering them by hand):
f 1000 1200 0
As with the VBS script the task of our application is to display "Hello, World!" message so we have to use Win32 API function MessageBoxA located in system module USER32.dll. Then the application terminates calling one more API function ExitProcess located in KERNEL32.dll. Thus, we must import these functions into our application.
To import functions the system first maps required DLLs into process's address space. Then addresses of these functions must be stored in a special location - Import Address Table (IAT). The system accomplishes it automatically, but we must provide a set of housekeeping import tables for that purpose.
Import address tables must be located at the very beginning of a section. They represent a sequence of 4 byte (DWORD) fields that Windows loader fills up with appropriate imported function addresses. The functions must be arranged in the certain order (discussed shortly). Each IAT table contains data related to one module (DLL); the end of the table is designated with zero-filled field. Multiple tables can follow one another when necessary. The addresses in the IAT are filled by Windows but before the loading process the IAT's fields must be identical to the fields of the appropriate lookup table - otherwise the loader will report an error.
So we need two IATs: one for USER32.dll and the other one for KERNEL32.dll. Only one function must be imported out of each module therefore both tables will have 8 bytes (4 bytes for the address and another 4 bytes for the final zero field). The first IAT will be at offset 1000h relative to the base load address and the second IAT at offset 1008h. We will enter them later.
For now we proceed to data. The function MessageBoxA takes the two strings addresses among other arguments. The first address points to the message to output and the second one - to the message title. We align the addresses on a paragraph boundary (it is not mandatory but we do it for the convenience purposes only). We put the ASCII string 'VBScript' at offset 1010h (the message title will be similar to that of the VBS script):
We put the 'Hello, World!' string at offset 1020h and leave more space here in case that we would want to change the message later):
db "Hello, World!"
The names of the modules being imported - 'USER32.dll' and 'KERNEL32.dll' - will be put at the offsets 1040h and 1050h accordingly; they will be referred to by the Import Directory Table:
We must provide the names of the imported functions, but they use peculiar strings: the first two bytes represent a hint for the loader and the name itself follows the hint. The hint is an index into the Export Name Pointer Table in a DLL where the loader can find the required function name. If the name is absent at the indicated location the loader seeks all Export Name Table entries in the DLL that requires much more time of course. In our case we will have to manage without any hints: leave zeros there. Put the string 'MessageBoxA' at offset 1060h and the string 'ExitProcess' at offset 1070h (remember that function names unlike DLL names are case sensitive):
Since we are done with the text strings it is time to verify the entered data. Debug has 'd' command to dump memory locations:
If the data is correct the entered text names should be displayed on the right side of the console window. On error enter data repeatedly at the same offsets.
Further we put the lookup tables beginning at offset 1080h. Similar to the IAT the Import Lookup Table consists of consecutive set of 32-bit (DWORD) values that are concluded with a zero filled field and relate to functions in a single module. The table fields designate a method to lookup exported functions in a DLL: by ordinal or by name. In the latter case the field contains an offset of hint/name string for the required function. We have offsets 1060h for 'MessageBoxA' and 1070h for 'ExitProcess' (unfortunately debug does not recognize 32-bin numbers so we have to enter them as pairs of 16-bit numbers; remember however about the reversed byte order for PC):
Since the IAT must be identical to the lookup table until binding we can now get back and enter leaved earlier empty fields:
Now we have reached the main table - the Import Directory Table. It ties all data prepared earlier together. Each entry of this table contains five 4-byte (DWORD) fields and relates to a single imported module (DLL). The first field contains offset (relative to the base load address) of the Import Lookup Table for the given DLL; the second and the third ones are not used and are zero filled; the fourth one contains an offset of the DLL name string and the fifth one - an offset of the appropriate IAT. The number of Import Directory Table entries is equal to the number of imported modules plus one more entry with all zero-filled fields to indicate the end of the directory table. So we have the table with 3 entries (for USER32.dll, KERNEL32.dll and one empty entry). The table has an offset 1090h and the size 3x5x4=60 (3Ch) bytes:
(an offset of the first lookup table)
(two empty fields)
(an offset of the 'USER32.dll' string)
(an offset of the first IAT)
Similarly we fill in the second entry:
The subsequent 20 bytes are left empty.
Now we need only to enter the code. The 'MessageBoxA' function takes 4 DWORD parameters: the handle of an application window (we have none, i.e. 0), the address of the message, the address of title and the style of the message box (a numeric value; here it is 0). The parameters are passed through the stack in a reversed order, i.e. the last parameter is pushed first. So we would have the following assembly code:
push offset title ; here - 401010h
push offset message ; here - 401020h
call IAT ; address of MessageBoxA
Take into account that linear addresses must be pushed onto the stack, not offsets, so we have to add the base load address (400000h) to the string offsets 1020h and 1010h getting 401020h and 401010h correspondingly; and for the imported MessageBoxA address - 401000h. Since debug does not use 32-bin offsets we have to do the work on ours own (remember of the reversed byte order):
ExitProcess (the address of which is located in the second IAT at the linear address 401008h) takes only one parameter - exit code (here it is 0):
That is all with the program. We have got "memory image" (at offset 1000h), now it is necessary to move it in its place in the file (at offset 200h):
m 1000 1200 200
Now we must fill in the header only. The PE file header may be divided into "old" one and "new" one. The "old" header in turn consists of a slightly modified EXE-DOS header and an optional DOS stub which usually prints out "This program cannot be run in DOS mode" while attempting to launch a file under DOS. But instead there may be any other DOS program. The DOS header can have a program identifier and manufacturer name at offset 20h relative to the beginning of the file but this field nearly always is left empty. More essential is that the other field of the DOS header at offset 3Ch must have a 32-bit pointer to the PE header.
The only thing we must keep in the DOS header is the MS-DOS EXE file signature (ASCII characters 'MZ'):
We omit the DOS stub altogether; so the PE header will immediately follow a 4-byte pointer at offset 3Ch, i.e. it will start at offset 40h. We put this number as a value of the pointer:
The "new" header consists of the PE header itself and of the section table. The PE header in turn is divided to a COFF file header and an optional header. At the end of the latter there is a data directories table. Each data directory is represented by two DWORD values, the first one containing some housekeeping table offset relative to the base load address and the second one containing the size of that table. If any table is not used the corresponding data directory is filled with zeros. The following table contains only those PE header fields that are mandatory for launching an executable:
COFF file header
Signature: ASCII characters 'PE' and two null bytes
CPU type (usually 14Сh for i386)
The number of sections in an image
The size of the optional header; usually E0h
Flags; for a Win32 applications this is generally 10Fh
"Magic" number 10Bh
Entry point (an offset relative to the image base)
Image base address (for EXE this is generally 400000h)
Alignment of sections in memory (system page size, 4Kb=1000h)
Alignment of sections in file (multiple 512 (200h) bytes) - File alignment
Major version number of required OS; generally 4
Major version number of subsystem; generally 4
The size of an image including all headers; must be multiple of Section alignment
The combined size of all headers ("old" and "new"); multiple of File alignment
Subsystem (2 - GUI, 3 - console)
The number of Data directories (generally 10h)
Import table offset
Import table size
Note. To launch our little application it is sufficient to fill in indicated in the table fields (it was verified for three versions of Windows: 98 SE, 2000 Server and XP Pro). However more complicated applications may require filling in also 4-byte (DWORD) fields at offsets 60h (size of stack to reserve), 64h (size of stack to commit), 68h (size of heap to reserve), 6Ch (size of heap to commit).
The section table immediately follows the PE header and describes the program image sections. It actually maps the sections written from a disk to the memory. The number of entries in the section table is equal to the number of sections in the program image indicated in the PE header field at offset 6. Each section table entry has the following format:
Arbitrary section name (is used in linking). It is null-padded eight-byte ASCII string.
The size of the section when loaded into memory.
Offset of the section in memory relative to the image base.
The size of the section data on disk, a multiple of the file alignment.
Offset of the section data on disk, a multiple of the file alignment.
These fields are used only in object files.
Section flags. The most common ones are:20h - executable code; 40h - initialized data; 80h - uninitialized data; 20000000h - section can be executed; 40000000h - section can be read; 80000000h - section can be written to.
Lets complete creating of our header. The PE signature is at offset 40h:
The CPU is i386, the number of sections is 1:
The size of optional header is 0E0h, then the program flags and the magic number:
Program's entry point:
The image base address (400000h; we enter in reversed order), the alignment in memory - 1000h, in file - 200h:
The OS version - 4, the subsystem version - 4, intervals are filled in with zeros:
The image size in memory (including headers) - 2000h, the header size in a file - 200h, subsystem 2:
The number of data directories is 10h:
We use only one data directory: the import table offset - 1090h, the size - 3Ch. Other entries are left empty (zero-filled):
The section table begins at offset 140h; we have only one entry there. No name will be provided. The section size in memory is 1000h starting at offset 1000h; size in file is 200h bytes starting at offset 200h:
Finally, the flags: the section contains executable code, can be executed, read and written to. The sum of all flags (in this case it is identical to bitwise OR) is E0000020h (must be written in reversed order):
Debug can save files only in com-format and the first 100h bytes are skipped. Therefore we must move the whole (400h) image in memory by 100h bytes:
m 0 400 100
Next we entitle our file; debug can not save exe files so we must save the file with the .bin extension and then rename it to exe.
The number of bytes to save must be entered in CX register and writing is accomplished with 'w' command:
To exit debug use 'q' command. But carefully verify all the entered data before closing the application. When the numbers are arranged in tables erroneously an attempt to launch the application can produce a message "<Program name> is not a Win32 application" or cause a system crash altogether.
Congratulations: you have created a fully-fledged Win32 application not even in assembly but in binary!