Working of a Compiler

Compilation: source code ==> relocatable object code (binaries)

 

  1. Linking: many relocatable binaries (modules plus libraries) ==> one relocatable binary (with all external references satisfied)
  2. Loading: relocatable ==> absolute binary (with all code and data references bound to the addresses occupied in memory)
  3. Execution: control is transferred to the first instruction of the program
Stages from Source to Executable

Working of a compiler - flowchart

 


  1.  
At compile time (CT), absolute addresses of variables and statement labels are not known.
Here in this aritcle  , the compiling process is explained
A compiler for a language generally has several different stages as it
processes the input.
 
These are:


1.  Preprocessing
During the preprocessing stage, comments, macros, and directives are
processed. Comments are removed from the source file.  This greatly simplifies the later stages.
 
If the language supports macros, the macros are replaced with the equivalent
text.
For example, C and C++ support macros using the #define directive.  So if a
macro were defined for pi as:
#define PI 3.1415927
Any time the preprocessor encountered the word PI, it would replace PI with
3.1415927 and process the resulting text.
  
The preprocessor may also replace special strings with other characters.  In
C and C++, the preprocessor recognizes the \ character as an escape code,
and will replace the escape sequence with a special character.  For example
\t is the escape code for a tab, so \t would be replaced at this stage with
a tab character.
 
 
2.  Lexical analysis is the process of breaking down the source files into
key words, constants, identifiers, operators and other simple tokens.  A
token is the smallest piece of text that the language defines.
  
 
3. Syntactical analysis is the process of combining the tokens into
well-formed expressions, statements, and programs.  Each language has
specific rules about the structure of a program--called the grammar or
syntax.  Just like English grammar, it specifies how things may be put
together.  In English, a simple sentence is: subject, verb, predicate.
 
In C or C++ an if statement is:
if ( expression ) statement
 
The syntactical analysis checks that the syntax is correct, but doesn't
enforce that it makes sense.  In English, a subject could be:  Pants,  the
verb: are, the predicate: a kind of car.  This would yield: Pants are a kind
of car.  Which is a sentence, but doesn't make much sense.
 
In C or C++, a constant can be used in an expression: so the expression:
float x = "This is red"++
 
Is syntactically valid, but doesn't make sense because a float number can
not have string assigned to it, and a string can not be incremented.
 
 
4. Semantic analysis is the process of examining the types and values of the
statements used to make sure they make sense.  During the semantic
analysis, the types, values, and other required information about statements
are recorded, checked, and transformed as appropriate to make sure the
program makes sense.
 
For C/C++ in the line:
float x = "This is red"++
 
The semantic analysis would reveal the types do not match and can not be
made to match, so the statement would be rejected and an error reported.
 
While in the statement:
 
float y = 5 + 3.0;
 
The semantical analysis would reveal that 5 is an integer, and 3.0 is a
double, and also that the rules for the language allow 5 to be converted to
a double, so the addition could be done, so the expression would then be
transformed to a double and the addition performed.  Then, the compiler
would recognize y as a float, and perform another conversion from the double
8.0 to a float and process the assignment.
 
 
5. Intermediate code generation
Depending on the compiler, this step may be skipped, and instead the program
may be translated directly into the target language (usually machine object
code).  If this step is implemented, the compiler designers also design a
machine independent language of there own that is close to machine language
and easily translated into machine language for any number of different
computers.
 
The purpose of this step is to allow the compiler writers to support
different target computers and different languages with a minimum of effort.
The part of the compiler which deals with processing the source files,
analyzing the language and generating the intermediate code is called the
front end, while the process of optimizing and converting the intermediate
code into the target language is called the back end.
 
 
6. Code optimization
During this process the code generated is analyzed and improved for
efficiency.  The compiler analyzes the code to see if improvements can be
made to the intermediate code that couldn't be made earlier.  For example,
some languages like Pascal do not allow pointers, while all machine
languages do.  When accessing arrays, it is more efficient to use pointers,
so the code optimizer may detect this case and internally use pointers.
 
 
7. Code generation
Finally, after the intermediate code has been generated and optimized, the
compiler will generated code for the specific target language.  Almost
always this is machine code for a particular target machine.
 
Also, it us usually not the final machine code, but is instead object code,
which contains all the instructions, but not all of the final memory
addresses have been determined.
 
A subsequent program, called a linker is used to combine several different
object code files into the final executable program.