This content originally appeared on DEV Community and was authored by Mujahida Joynab
A compiler works in several stages to transform high-level source code into efficient machine code. Each stage has its own role:
1. Lexical Analyzer (Scanner)
- Input: Pure high-level language source code
- Process: Breaks the input into meaningful sequences called tokens (e.g., keywords, identifiers, operators).
- Output: String (sequence) of tokens
2. Syntax Analyzer (Parser)
- Input: String of tokens from the lexical analyzer
- Process: Checks whether tokens follow the correct grammar of the language.
- Output: Parse Tree (also called Syntax Tree)
3. Semantic Analyzer
- Input: Parse tree
- Process: Ensures the program has semantic correctness (e.g., type checking, variable declarations, scope rules).
- Output: Intermediate representation, often Three-Address Code (TAC)
4. Intermediate Code Generator
- Input: Validated parse tree with semantic meaning
- Process: Converts into an intermediate code that is easier to optimize and translate into machine code.
- Output: Intermediate code
5. Code Optimizer
- Input: Intermediate code
- Process: Improves efficiency without changing meaning (e.g., removing redundant code, improving memory use).
- Output: Optimized intermediate code
6. Target Code Generator
- Input: Optimized intermediate code
- Process: Translates into the final target machine code (assembly or binary).
- Output: Executable machine code
In short:
High-Level Code → Tokens → Parse Tree → TAC → Optimized Code → Machine Code
This content originally appeared on DEV Community and was authored by Mujahida Joynab