Recognition of keywords/constants in Compiler Construction
Lexical analysis is the first phase of a compiler. It takes the modified source code from language preprocessors that are written in the form of sentences. The lexical analyzer breaks these syntaxes into a series of tokens, by removing any whitespace or comments in the source code.If the lexical analyzer finds a token invalid, it generates an error. The lexical analyzer works closely with the syntax analyzer. It reads character streams from the source code, checks for legal tokens, and passes the data to the syntax analyzer when it demands.
In programming language, keywords, constants, identifiers, strings, numbers, operators and punctuations symbols can be considered as tokens.
Activity Outcomes:
This lecture teaches you
- How to recognize constants from a source program written in a high level
- How to recognize keywords from a source program written in a high level language
Instructor Note:
Students should know how to write regular expressions in C#
Introduction
A regular expression is a pattern that describes a set of strings. Regular expressions are constructed analogously to arithmetic expressions by using various operators to combine smaller expressions.
The fundamental building blocks are the regular expressions that match a single character. Most characters, including all letters and digits, are regular expressions that match themselves. Any meta character with special meaning may be quoted by preceding it with a backslash. In basic regular expressions the metacharacters “?”, “+”, “{“, “|”, “(“, and “)” lose their special meaning; instead use the backslashed versions “\?”, “\+”, “\{“, “\|”, “\(“, and “\)”.
Activities:
Activity 1:
Design a regular expression for constants (digits plus floating point numbers):
Regular Expression for Constants: [0-9]+((.[0-9]+)?([e][+|-][0-9]+)?)?
Solution:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Text.RegularExpressions;
namespace Sessional1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
String var = richTextBox1.Text; // take input from a richtextbox/textbox
String[] words = var.Split(‘ ‘); // split the input on the basis of space
Regex regex1 = new Regex(@”^[0-9][0-9]*(([.][0-9][0-9]*)?([e][+|-][0-9][0-9]*)?)?$”); // Regular Expression for variables
for (int i = 0; i < words.Length; i++)
{
Match match1 = regex1.Match(words[i]);
if (match1.Success)
{
richTextBox2.Text += words[i] + ” “;
}
else {
MessageBox.Show(“invalid “+words[i]);
}
}
}
}
Activity 2:
Design a regular expression for keywords.
Regular Expression for keywords: [int | float | double | char]
Solution:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Text.RegularExpressions;
namespace Sessional1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
String var = richTextBox1.Text; // take input from a richtextbox/textbox
String[] words = var.Split(‘ ‘); // split the input on the basis of space
Regex regex1 = new Regex(@”^[int | float | char]* $”); // Regular Expression for variables
for (int i = 0; i < words.Length; i++)
{
Match match1 = regex1.Match(words[i]);
if (match1.Success)
{
richTextBox2.Text += words[i] + ” “;
}
else {
MessageBox.Show(“invalid “+words[i]);
}
}
}
}
Home Activities:
Activity 1:
Design a regular expression for floating point numbers having length not greater than 60.
Activity 2:
Design a single regular expression for following numbers: 8e4, 5e-2 , 6e9
Activity 3:
Design a regular expression for finding all the words starting with „t‟ and „m‟ in the following document
Related Links
- Introduction to C#
- Lexical Analyzer Recognition of operators/variables
- Recognition of keywords/constants
- Lexical Analyzer Input Buffering scheme
- Symbol Table in Compiler Construction
- First set of a given grammar using Array
- Follow set of a given grammar using Array
- Bottom-up Parser-I DFA Implementation
- Bottom-up Parser-II Stack parser using SLR
- Semantic Analyzer
#Compiler Construction complete course # Compiler Construction past paper # Compiler Construction project #Computer Science all courses #University Past Paper #Programming language #Introduction to C# #Lexical Analyzer Recognition of operators/variables #Recognition of keywords/constants #Lexical Analyzer Input Buffering scheme #Symbol Table in Compiler Construction #First set of a given grammar using Array #Follow set of a given grammar using Array #Bottom-up Parser-I DFA Implementation #Bottom-up Parser-II Stack parser using SLR #Semantic Analyzer