On my quest to build an operating system I am first reading Computer Systems: A Programmer's Perspective, by Randal E. Bryant and David R. O'Hallaron, Carnegie Mellon University. This book is dedicated to understand how modern computers actually work. I should note that some of these topics are pretty hefty so don't worry if you find it confusing, so do I.
In this post we are going to walk through the compilation system of a basic hello world program.
A quick note
Before we get started I should note that this is based of a simple C program called hello.c. If you want to set up your own project watch this YouTube Video HERE
Programs are translated by other programs in different forms
So normally we just hit the compile button and hope for the best, but what is really going on when we hit compile?
The code above is called source code and eventually through a series of steps it will get compiled down into instructions that our computer will be able to understand. But before we go any further lets have a little history lesson on the C programming language.
Brief history of C
C was developed from 1969 to 1973 by Dennis Ritchie of Bell laboratories. Dennis Ritchie is on record saying that, "C is quirky, flawed and enormously successful. How was it able to be successful if it has flaws?
Well C was developed from the beginning as the system programming language for Unix, almost all of the Unix kernel and its supporting tool/libraries were written in C. Unix became really popular in the 70's with universities and that meant that C became popular. Also, compared to other languages at the time C was very easy to use so.
However, some of the downsides of C include but are not limited to, no support for abstractions like classes, objects, exceptions, and confusing pointer errors. More modern languages like Java and C++ address these issues
I want to quickly say that when we talk about these phases, remember that they are not physical objects they are actually just programs written to transform our code from one phase to another
Now as we mentioned earlier, our code is written in human readable source code but we need to get it into machine readable instructions. This process occurs in 4 phases
1) Preprocessing phase
The preprocessor modifies the original C program according to the directives that begin with the # character. For example we use #include <stdio.h> which tells the preprocessor to read the contents of the standard header file and pastes the contents directly into our program file. The result is the creation of another C program typically with an .i suffix. So in this phase our Hello.c programs creates another file called Hello.i.
2) Compilation Phase
The computer translates the hello.i into the text file hello.s, which will contain assembly-language instructions. Assembly language is actually very useful because a lot of different compilers output the same assembly language.
3) Assembly Phase
The assembler takes hello.s and translates it into machine-language instructions(machine code). Then packages it into a relocatable object program and stores the results in a .o file which contains encoded bytes that represent the machine language instructions instead of characters. If we opened this file it would look like gibberish to us.
4) Linker Phase
Notice that our hello program calls printf which is part of the standard C library that is provided by every C compiler. However printf is actually inside of a separate precompile object called printf.o, for our code to work we need to merge it into our hello.o program. This is what the linker does. It is a program that merges everything together into one file. The result is a executable object(executable) file. This file is now ready to be loaded into memory and executed by the system.
Actually Running the file
When we run our hello.c file, our system is going to go through our 4 compilation phases. Then it takes the data of our executable from the disk storage(where our file is stored) to main memory. Main memory is just temporary storage that holds a program and the data it manipulates while the processor(CPU) executes it. So once our code is loaded into main memory, the processor(CPU) begins executing the machine-language instructions in our hello program's main function and we get Hello, World printed to the console.
Thank you for taking the time out of your day to read this blog post of mine. If you have any questions or concerns please comment below or reach out to me on Twitter.