Compiler Design - Syntax Analysis (2023)

Compiler Design - Syntax Analysis (1)

  • Compiler Design Tutorial
  • Compiler Design - Home
  • Compiler Design - Overview
  • Compiler Design - Architecture
  • Compiler Design - Phases of Compiler
  • Compiler Design - Lexical Analysis
  • Compiler - Regular Expressions
  • Compiler Design - Finite Automata
  • Compiler Design - Syntax Analysis
  • Compiler Design - Types of Parsing
  • Compiler Design - Top-Down Parser
  • Compiler Design - Bottom-Up Parser
  • Compiler Design - Error Recovery
  • Compiler Design - Semantic Analysis
  • Compiler - Run-time Environment
  • Compiler Design - Symbol Table
  • Compiler - Intermediate Code
  • Compiler Design - Code Generation
  • Compiler Design - Code Optimization
  • Compiler Design Useful Resources
  • Compiler Design - Quick Guide
  • Compiler Design - Useful Resources
  • Selected Reading
  • UPSC IAS Exams Notes
  • Developer's Best Practices
  • Questions and Answers
  • Effective Resume Writing
  • HR Interview Questions
  • Computer Glossary
  • Who is Who

'; var adpushup = adpushup || {}; adpushup.que = adpushup.que || []; adpushup.que.push(function() { adpushup.triggerAd(ad_id); });

Previous Page

Next Page

Syntax analysis or parsing is the second phase of a compiler. In this chapter, we shall learn the basic concepts used in the construction of a parser.

We have seen that a lexical analyzer can identify tokens with the help of regular expressions and pattern rules. But a lexical analyzer cannot check the syntax of a given sentence due to the limitations of the regular expressions. Regular expressions cannot check balancing tokens, such as parenthesis. Therefore, this phase uses context-free grammar (CFG), which is recognized by push-down automata.

CFG, on the other hand, is a superset of Regular Grammar, as depicted below:

Compiler Design - Syntax Analysis (3)

It implies that every Regular Grammar is also context-free, but there exists some problems, which are beyond the scope of Regular Grammar. CFG is a helpful tool in describing the syntax of programming languages.

Context-Free Grammar

In this section, we will first see the definition of context-free grammar and introduce terminologies used in parsing technology.

A context-free grammar has four components:

  • A set of non-terminals (V). Non-terminals are syntactic variables that denote sets of strings. The non-terminals define sets of strings that help define the language generated by the grammar.

  • A set of tokens, known as terminal symbols (Σ). Terminals are the basic symbols from which strings are formed.

  • A set of productions (P). The productions of a grammar specify the manner in which the terminals and non-terminals can be combined to form strings. Each production consists of a non-terminal called the left side of the production, an arrow, and a sequence of tokens and/or on- terminals, called the right side of the production.

  • One of the non-terminals is designated as the start symbol (S); from where the production begins.

The strings are derived from the start symbol by repeatedly replacing a non-terminal (initially the start symbol) by the right side of a production, for that non-terminal.


We take the problem of palindrome language, which cannot be described by means of Regular Expression. That is, L = { w | w = wR } is not a regular language. But it can be described by means of CFG, as illustrated below:

G = ( V, Σ, P, S )


V = { Q, Z, N }Σ = { 0, 1 }P = { Q → Z | Q → N | Q → ℇ | Z → 0Q0 | N → 1Q1 }S = { Q }

This grammar describes palindrome language, such as: 1001, 11100111, 00100, 1010101, 11111, etc.

Syntax Analyzers

A syntax analyzer or parser takes the input from a lexical analyzer in the form of token streams. The parser analyzes the source code (token stream) against the production rules to detect any errors in the code. The output of this phase is a parse tree.

Compiler Design - Syntax Analysis (4)

This way, the parser accomplishes two tasks, i.e., parsing the code, looking for errors and generating a parse tree as the output of the phase.

Parsers are expected to parse the whole code even if some errors exist in the program. Parsers use error recovering strategies, which we will learn later in this chapter.


A derivation is basically a sequence of production rules, in order to get the input string. During parsing, we take two decisions for some sentential form of input:

  • Deciding the non-terminal which is to be replaced.
  • Deciding the production rule, by which, the non-terminal will be replaced.

To decide which non-terminal to be replaced with production rule, we can have two options.

Left-most Derivation

If the sentential form of an input is scanned and replaced from left to right, it is called left-most derivation. The sentential form derived by the left-most derivation is called the left-sentential form.

Right-most Derivation

If we scan and replace the input with production rules, from right to left, it is known as right-most derivation. The sentential form derived from the right-most derivation is called the right-sentential form.


Production rules:

E → E + EE → E * EE → id 

Input string: id + id * id

The left-most derivation is:

E → E * EE → E + E * EE → id + E * EE → id + id * EE → id + id * id

Notice that the left-most side non-terminal is always processed first.

The right-most derivation is:

E → E + EE → E + E * EE → E + E * idE → E + id * idE → id + id * id

Parse Tree

A parse tree is a graphical depiction of a derivation. It is convenient to see how strings are derived from the start symbol. The start symbol of the derivation becomes the root of the parse tree. Let us see this by an example from the last topic.

We take the left-most derivation of a + b * c

The left-most derivation is:

E → E * EE → E + E * EE → id + E * EE → id + id * EE → id + id * id

Step 1:

E → E * ECompiler Design - Syntax Analysis (5)

Step 2:

E → E + E * ECompiler Design - Syntax Analysis (6)

Step 3:

E → id + E * ECompiler Design - Syntax Analysis (7)

Step 4:

E → id + id * ECompiler Design - Syntax Analysis (8)

Step 5:

E → id + id * idCompiler Design - Syntax Analysis (9)

In a parse tree:

  • All leaf nodes are terminals.
  • All interior nodes are non-terminals.
  • In-order traversal gives original input string.

A parse tree depicts associativity and precedence of operators. The deepest sub-tree is traversed first, therefore the operator in that sub-tree gets precedence over the operator which is in the parent nodes.


A grammar G is said to be ambiguous if it has more than one parse tree (left or right derivation) for at least one string.


E → E + EE → E – EE → id

For the string id + id – id, the above grammar generates two parse trees:

Compiler Design - Syntax Analysis (10)

The language generated by an ambiguous grammar is said to be inherently ambiguous. Ambiguity in grammar is not good for a compiler construction. No method can detect and remove ambiguity automatically, but it can be removed by either re-writing the whole grammar without ambiguity, or by setting and following associativity and precedence constraints.


If an operand has operators on both sides, the side on which the operator takes this operand is decided by the associativity of those operators. If the operation is left-associative, then the operand will be taken by the left operator or if the operation is right-associative, the right operator will take the operand.


Operations such as Addition, Multiplication, Subtraction, and Division are left associative. If the expression contains:

id op id op id

it will be evaluated as:

(id op id) op id

For example, (id + id) + id

Operations like Exponentiation are right associative, i.e., the order of evaluation in the same expression will be:

id op (id op id)

For example, id ^ (id ^ id)


If two different operators share a common operand, the precedence of operators decides which will take the operand. That is, 2+3*4 can have two different parse trees, one corresponding to (2+3)*4 and another corresponding to 2+(3*4). By setting precedence among operators, this problem can be easily removed. As in the previous example, mathematically * (multiplication) has precedence over + (addition), so the expression 2+3*4 will always be interpreted as:

2 + (3 * 4)

These methods decrease the chances of ambiguity in a language or its grammar.

Left Recursion

A grammar becomes left-recursive if it has any non-terminal ‘A’ whose derivation contains ‘A’ itself as the left-most symbol. Left-recursive grammar is considered to be a problematic situation for top-down parsers. Top-down parsers start parsing from the Start symbol, which in itself is non-terminal. So, when the parser encounters the same non-terminal in its derivation, it becomes hard for it to judge when to stop parsing the left non-terminal and it goes into an infinite loop.


(1) A => Aα | β(2) S => Aα | β A => Sd 

(1) is an example of immediate left recursion, where A is any non-terminal symbol and α represents a string of non-terminals.

(2) is an example of indirect-left recursion.

Compiler Design - Syntax Analysis (11)

A top-down parser will first parse the A, which in-turn will yield a string consisting of A itself and the parser may go into a loop forever.

Removal of Left Recursion

One way to remove left recursion is to use the following technique:

The production

A => Aα | β

is converted into following productions

A => βA'A'=> αA' | ε

This does not impact the strings derived from the grammar, but it removes immediate left recursion.

Second method is to use the following algorithm, which should eliminate all direct and indirect left recursions.

STARTArrange non-terminals in some order like A1, A2, A3,…, An for each i from 1 to n { for each j from 1 to i-1 { replace each production of form Ai ⟹Aj𝜸 with Ai ⟹ δ1𝜸 | δ2𝜸 | δ3𝜸 |…| 𝜸 where Aj ⟹ δ1 | δ2|…| δn are current Aj productions } } eliminate immediate left-recursion END


The production set

S => Aα | β A => Sd

after applying the above algorithm, should become

S => Aα | β A => Aαd | βd

and then, remove immediate left recursion using the first technique.

A => βdA'A' => αdA' | ε

Now none of the production has either direct or indirect left recursion.

Left Factoring

If more than one grammar production rules has a common prefix string, then the top-down parser cannot make a choice as to which of the production it should take to parse the string in hand.


If a top-down parser encounters a production like

A ⟹ αβ | α𝜸 | …

Then it cannot determine which production to follow to parse the string as both productions are starting from the same terminal (or non-terminal). To remove this confusion, we use a technique called left factoring.

Left factoring transforms the grammar to make it useful for top-down parsers. In this technique, we make one production for each common prefixes and the rest of the derivation is added by new productions.


The above productions can be written as

A => αA'A'=> β | 𝜸 | … 

Now the parser has only one production per prefix which makes it easier to take decisions.

First and Follow Sets

An important part of parser table construction is to create first and follow sets. These sets can provide the actual position of any terminal in the derivation. This is done to create the parsing table where the decision of replacing T[A, t] = α with some production rule.

First Set

This set is created to know what terminal symbol is derived in the first position by a non-terminal. For example,

α → t β

That is α derives t (terminal) in the very first position. So, t ∈ FIRST(α).

Algorithm for calculating First set

Look at the definition of FIRST(α) set:

  • if α is a terminal, then FIRST(α) = { α }.
  • if α is a non-terminal and α → ℇ is a production, then FIRST(α) = { ℇ }.
  • if α is a non-terminal and α → 𝜸1 𝜸2 𝜸3 … 𝜸n and any FIRST(𝜸) contains t then t is in FIRST(α).

First set can be seen as:

Compiler Design - Syntax Analysis (12)

Follow Set

Likewise, we calculate what terminal symbol immediately follows a non-terminal α in production rules. We do not consider what the non-terminal can generate but instead, we see what would be the next terminal symbol that follows the productions of a non-terminal.

Algorithm for calculating Follow set:

  • if α is a start symbol, then FOLLOW() = $

  • if α is a non-terminal and has a production α → AB, then FIRST(B) is in FOLLOW(A) except ℇ.

  • if α is a non-terminal and has a production α → AB, where B ℇ, then FOLLOW(A) is in FOLLOW(α).

Follow set can be seen as: FOLLOW(α) = { t | S *αt*}

Limitations of Syntax Analyzers

Syntax analyzers receive their inputs, in the form of tokens, from lexical analyzers. Lexical analyzers are responsible for the validity of a token supplied by the syntax analyzer. Syntax analyzers have the following drawbacks -

  • it cannot determine if a token is valid,
  • it cannot determine if a token is declared before it is being used,
  • it cannot determine if a token is initialized before it is being used,
  • it cannot determine if an operation performed on a token type is valid or not.

These tasks are accomplished by the semantic analyzer, which we shall study in Semantic Analysis.

Previous Page Print PageNext Page




Compiler Design - Syntax Analysis? ›

Syntax analysis, also known as parsing, is a process in compiler design where the compiler checks if the source code follows the grammatical rules of the programming language. This is typically the second stage of the compilation process, following lexical analysis.

How to do semantic analysis in compiler design? ›

For example: int a = “value”; should not issue an error in lexical and syntax analysis phase, as it is lexically and structurally correct, but it should generate a semantic error as the type of the assignment differs. These rules are set by the grammar of the language and evaluated in semantic analysis.

What is the syntax analysis phase of a compiler? ›

Syntax Analysis: The second phase of a compiler is syntax analysis, also known as parsing. This phase takes the stream of tokens generated by the lexical analysis phase and checks whether they conform to the grammar of the programming language. The output of this phase is usually an Abstract Syntax Tree (AST).

What is syntax rules in compiler design? ›

Syntax Analysis is a second phase of the compiler design process in which the given input string is checked for the confirmation of rules and structure of the formal grammar. It analyses the syntactical structure and checks if the given input is in the correct syntax of the programming language or not.

What is syntax analysis used for? ›

The syntax analysis is the essential step for the compilation of programs written in programming languages. In order to produce the object programs executable on the computer, the source program has to be analyzed with respect to its correctness, the correctness of the lexicon, syntax and semantics.

What is the function of semantic analyzer? ›

Semantic Analyzer:

It uses syntax tree and symbol table to check whether the given program is semantically consistent with language definition. It gathers type information and stores it in either syntax tree or symbol table. This type information is subsequently used by compiler during intermediate-code generation.

What are the tasks performed by semantic analysis? ›

Semantic analysis is the task of ensuring that the declarations and statements of a program are semantically correct, i.e, that their meaning is clear and consistent with the way in which control structures and data types are supposed to be used.

What is syntax analysis vs semantic analysis in compiler? ›

Syntactic and Semantic Analysis differ in the way text is analyzed. In the case of syntactic analysis, the syntax of a sentence is used to interpret a text. In the case of semantic analysis, the overall context of the text is considered during the analysis.

What is the difference between lexical analysis and syntax analysis? ›

Lexical analysis is the process of converting a sequence of characters in a source code file into a sequence of tokens. Syntax analysis is the process of checking the tokens for correct syntax according to the rules of the programming language. Lexical analysis is often the first phase of the compilation process.

What are the 3 rules of syntax? ›

The basic rules of syntax in English
  • All sentences require a subject and a verb. ...
  • A single sentence should include one main idea. ...
  • The subject comes first, and the verb comes second. ...
  • Subordinate clauses (dependent clauses) also require a subject and verb.
Apr 29, 2022

What are the three types of syntax coding? ›

To simplify understanding and analyzing a language's syntax, we separate syntax into three levels: lexical elements, context free syntax, and context sensitive syntax.

What is syntax analysis with example? ›

Syntax Analyzers

A syntax analyzer or parser takes the input from a lexical analyzer in the form of token streams. The parser analyzes the source code (token stream) against the production rules to detect any errors in the code. The output of this phase is a parse tree.

Which tool is used for syntax analysis? ›

Parser Generator – It produces syntax analyzers (parsers) from the input that is based on a grammatical description of programming language or on a context-free grammar. It is useful as the syntax analysis phase is highly complex and consumes more manual and compilation time.

How is syntax analysis done in natural language? ›

Syntactic analysis, also referred to as syntax analysis or parsing, is the process of analyzing natural language with the rules of a formal grammar. Grammatical rules are applied to categories and groups of words, not individual words. Syntactic analysis basically assigns a semantic structure to text.

What is the difference between semantic analysis and syntax? ›

Put simply, syntax refers to grammar, while semantics refers to meaning. Syntax is the set of rules needed to ensure a sentence is grammatically correct; semantics is how one's lexicon, grammatical structure, tone, and other elements of a sentence coalesce to communicate its meaning.

Why is semantic analysis difficult? ›

However, due to the vast complexity and subjectivity involved in human language, interpreting it is quite a complicated task for machines. Semantic Analysis of Natural Language captures the meaning of the given text while taking into account context, logical structuring of sentences and grammar roles.

What are the 7 semantic roles? ›

Semantic roles have eight types such as agent, patient, theme, location, experiencer, instruments, goal, and source.

Which algorithm is used for semantic analysis? ›

Machine learning algorithm-based automated semantic analysis

Such estimations are based on previous observations or data patterns. Machine learning-based semantic analysis involves sub-tasks such as relationship extraction and word sense disambiguation.

What is the difference between lexical and semantic analysis? ›

Lexical analysis detects lexical errors (ill-formed tokens), syntactic analysis detects syntax errors, and semantic analysis detects semantic errors, such as static type errors, undefined variables, and uninitialized variables.

What are the advantages of semantic analysis? ›

Semantic analysis offers considerable time saving for a company's teams. The analysis of the data is automated and the customer service teams can therefore concentrate on more complex customer inquiries, which require human intervention and understanding.

What are the real life applications of semantic processing? ›

Examples of Semantic Web Applications
  • Supply Chain Management – Biogen Idec.
  • Media Management – BBC.
  • Data Integration in Oil & Gas – Chevron.
  • Web Search and Ecommerce.

What is another name for syntax analysis? ›

Syntax analysis is also called parsing.

What is the difference between a parse tree and a syntax tree? ›

A syntax tree is a tree that displays the syntactic structure of a program while ignoring inappropriate analysis present in a parse tree. Thus, the syntax tree is nothing more than a condensed form of the parse tree.

What is the difference between syntax tree and parse tree? ›

A parse tree is created by a parser, which is a component of a compiler that processes the source code and checks it for syntactic correctness. A syntax tree is created by the compiler based on the parse tree after the parser has finished processing the source code.

How many types of semantic analysis are there? ›

There are two types of techniques in Semantic Analysis depending upon the type of information that you might want to extract from the given data. These are semantic classifiers and semantic extractors.

What are the four optimization techniques used in the compiler? ›

Machine-Independent Optimization Techniques:

Compile Time Evaluation. Common Subexpression Elimination. Variable Propagation. Dead Code Elimination.

What are the 4 stages of C compiler? ›

Compilation process in C involves four steps: pre-processing, compiling, assembling, and linking.

What is the basic rule of syntax? ›

Syntax rules are those rules that define or clarify the order in which words or elements are arranged to form larger elements, such as phrases, clauses, or statements. Syntax rules also impose restrictions on individual words or elements.

What type of syntax is Python? ›

The Python syntax defines a set of rules that are used to create Python statements while writing a Python Program. The Python Programming Language Syntax has many similarities to Perl, C, and Java Programming Languages. However, there are some definite differences between the languages.

What is the most common language syntax? ›

The SVO Pattern. The SVO pattern (Subject-Verb-Object) is the most common syntactic structure in written English.

Does syntax include punctuation? ›

Punctuation, Capitalization, and Spelling

Spelling rules, punctuation, and capitalization are writing conventions, and are not a part of grammar or syntax. Combining writing conventions with proper grammar makes your writing clear and easy to understand.

What is syntax in CPP? ›

C++ Comments > This section lays out basic C++ syntax. The first step to learn any language is to study its rules and regulations which together are called as syntax. Formally, the term “Syntax” means an approved set of pre-defined protocols or rules that we need to follow while working in a programming language.

What is syntax in English grammar? ›

syntax, the arrangement of words in sentences, clauses, and phrases, and the study of the formation of sentences and the relationship of their component parts.

What are the three 3 levels of programming languages? ›

There are three types of programming languages: machine language, assembly language, and high-level language.

What are the 3 key coding concepts used in every coding language? ›

Basic Syntax. Data Type and Structures. Flow Control Structures (Conditionals and loops)

What are the two components of syntax? ›

  • The principal categories of words (Nouns and Verbs, with the dependent categories of Adjectives and Adverbs). ...
  • Ordering of words, including sub-ordering, that is, the clustering of words within a larger order.
  • Function words (including subwords eg.

What is syntax tree in compiler design? ›

A syntax tree is a tree in which each leaf node represents an operand, while each inside node represents an operator. The Parse Tree is abbreviated as the syntax tree. The syntax tree is usually used when representing a program in a tree structure.

Is compiler construction hard? ›

Compiler construction is a complex task. A good compiler combines ideas from formal language theory, from the study of algorithms, from artificial intelligence, from systems design, from computer architecture, and from the theory of programming languages and applies them to the problem of translating a program.

What is the role of syntax analysis in compiler? ›

Syntax analysis is the second phase of a compiler. The output of syntax analysis is used as input to the semantic analyzer. In syntax analysis, the compiler checks the syntactic structure of the input string, i.e., whether the given string follows the grammar or not.

Do you have to study syntax when learning a language? ›

Studying syntax is relevant to a lot of subject areas in linguistics. We must study syntax to understand how children acquire their language, how they start constructing sentences and what stage do they learn the tacit syntactic rules of the language.

What is syntactic analysis in simple words? ›

Syntactic analysis is defined as analysis that tells us the logical meaning of certain given sentences or parts of those sentences. We also need to consider rules of grammar in order to define the logical meaning as well as correctness of the sentences.

How do you do semantic analysis? ›

The semantic analysis process begins by studying and analyzing the dictionary definitions and meanings of individual words also referred to as lexical semantics. Following this, the relationship between words in a sentence is examined to provide clear understanding of the context.

What is the method for semantic analysis? ›

Semantic analysis, a natural language processing method, entails examining the meaning of words and phrases to comprehend the intended purpose of a sentence or paragraph. This is often accomplished by locating and extracting the key ideas and connections found in the text utilizing algorithms and AI approaches.

What is the technique for semantic analysis? ›

Depending on the type of information you'd like to obtain from data, you can use one of two semantic analysis techniques: a text classification model (which assigns predefined categories to text) or a text extractor (which pulls out specific information from the text).

Where is semantic analysis performed in a compiler? ›

Semantic Analysis is the last step in the front-end compilation. It's called front-end because it basically is an interface between the source code written by a developer, and the transformation that this code will go through in order to become executable.

What is an example of semantic analysis? ›

The most important task of semantic analysis is to get the proper meaning of the sentence. For example, analyze the sentence “Ram is great.” In this sentence, the speaker is talking either about Lord Ram or about a person whose name is Ram.

What is the difference between syntax analysis and semantic analysis? ›

Syntactic and Semantic Analysis differ in the way text is analyzed. In the case of syntactic analysis, the syntax of a sentence is used to interpret a text. In the case of semantic analysis, the overall context of the text is considered during the analysis.

What are the three general approaches to semantics? ›

The three major types of semantics are formal, lexical, and conceptual semantics.

What are the three levels of semantic analysis? ›

Semantic analysis is examined at three basic levels: Semantic features of words in a text, Semantic roles of words in a text and Lexical relationship between words in a text.

What are the limitations of semantic analysis? ›

There are a number of drawbacks to Latent Semantic Analysis, the major one being is its inability to capture polysemy (multiple meanings of a word). The vector representation, in this case, ends as an average of all the word's meanings in the corpus. That makes it challenging to compare documents.

Which component is important for semantic analysis? ›

Correct answer is — (D) Type checking. Explanation: Type checking is an important component of semantic analysis.

What is the difference between syntax and semantic analysis in compiler design? ›

Syntax defines the rules and regulations that help write any statement in a programming language, while semantics refers to the meaning of the associated line of code in the programming language.

What are the six phases of compiler? ›

There are 6 phases in the compiler, namely, lexical analysis, syntax analysis, semantics analysis, intermediate code generation, code optimization, and code generation.

What is another name for semantic analysis? ›

Semantic analysis or context sensitive analysis is a process in compiler construction, usually after parsing, to gather necessary semantic information from the source code.


Top Articles
Latest Posts
Article information

Author: Moshe Kshlerin

Last Updated: 17/01/2024

Views: 5979

Rating: 4.7 / 5 (57 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Moshe Kshlerin

Birthday: 1994-01-25

Address: Suite 609 315 Lupita Unions, Ronnieburgh, MI 62697

Phone: +2424755286529

Job: District Education Designer

Hobby: Yoga, Gunsmithing, Singing, 3D printing, Nordic skating, Soapmaking, Juggling

Introduction: My name is Moshe Kshlerin, I am a gleaming, attractive, outstanding, pleasant, delightful, outstanding, famous person who loves writing and wants to share my knowledge and understanding with you.