A compiler is a program that translates source code into object code to be understood by a specific central processing unit (CPU). The act of translating source code into object code is known as compilation.
In this definition...
What is compilation?
Compilation is typically used for programs that translate source code from a high-level programming language (such as C++) to a low-level programming language (such as machine code) to create an executable program. Likewise, when a low-level language is converted into a high-level language, the process is called decompilation.
Phases of a compiler
A compiler executes its processes in phases to promote efficient design and correct transformations of source input to target output. The phases are as follows:
1. Lexical Analyzer
It is also called a scanner. The compiler converts the sequence of characters that appear in the source code into a series of string characters known as tokens. These tokens are defined by regular expressions which are understood by the lexical analyzer. It also removes lexical errors, comments, and whitespace.
2. Syntax Analyzer
The syntax analyzer constructs the parse tree, which is constructed to check for ambiguity in the given grammar. The syntax analyzer takes all tokens one by one and uses Context Free Grammar to construct the parse tree. Syntax error can be detected if the input is not in accordance with the grammar.
3. Semantic Analyzer
The semantic analyzer verifies the parse tree constructed by the syntax analyzer. It also does type checking, label checking, and flow control checking.
4. Intermediate Code Generator
The intermediate code generator generates intermediate code for execution by a machine. Intermediate code is converted into machine language using the last two phases, which are platform dependent.
5. Code Optimizer
The code optimizer transforms the code so that it consumes fewer resources and produces more speed. The meaning of the code that is being transformed is not altered.
6. Target Code Generator
This is the final step in the final stage of compilation. The target code generator writes code that a machine can understand and also registers allocation, instruction, and selection. The output is dependent on the type of assembler. The optimized code is then converted into machine code, forming the input to the linker and loader.
Types of compilers
There are many types of compilers:
- Cross compiler: The compiled program runs on a computer that has a different operating system or CPU from the one which the compiler runs on. It’s capable of creating code for a platform other than the one on which the compiler is running
- Source-to-source compiler: Also known as a transcompiler, it translates source code written in one programming language into source code of another programming language.
- Bytecode compilers: Used for Python and Java to compile source code to assemble the language of a theoretical machine, such as Prolog language used in artificial intelligence (AI) and computational linguistics. Such languages are used for various applications such as theorem proving, automated planning, and type systems.
- Hardware compilers: Also called synthesis tools, require inputs in hardware description language and offer an output in the form of a hardware configuration. Typically, hardware compilers are used for low-level computer hardware such as structured application-specific integrated circuits (ASIC) or field-programming gate array (FPGA).
- Binary compilers: These compilers accept binary files as input, apply optimization and transformations, and provide an output in the form of executable binaries. Binary files contain binary data that can be interpreted by a CPU. The binary compiler creates executable files targeting a run-time (execution) environment.
Compiler vs. Interpreter
An interpreter is a type of computer program used to convert high-level program statements into machine code. Both interpreters and compilers convert the high-level programs to a machine code; however, interpreters convert the code when the program is run, while compilers convert the code before the program it is run.
Further, compilers are best suited for C and C++ programming languages, while interpreters are better for web environments that require fast load times.
Another key difference between the two is that compiler code runs significantly faster than the interpreted code. This is because the compiler transforms the programming language into machine code at once before the program runs. On the other hand, an interpreter converts each program statement, one by one, into the machine as the program runs. Compilers can take an entire program, whereas interpreters can only take a single line of code.
The advantage of interpreters is that they are easy to use, while the disadvantage is that they can take longer in code execution. Too, interpreted programs can only run on machines that have a corresponding interpreter.
Common compiler software
Open64: Open64 is an open-source and free-to-use optimizing compiler for Itanium and x86-64 microprocessor architectures. It was first released 20 years ago and has a General Public License (GNU).
Free Pascal Compiler: This compiler is used for Object Pascal and Pascal, two closely related programming languages. It is a free software under the GNU.
GNAT: The GNAT is a free compiler used for Ada programming language which is part of the GNU compiler collection, also known as the GCC.
This definition was reviewed and updated by Ali Azhar in January 2022.