Lexical Analyzer Recognition of operators/variables in Compiler Construction
Lexical analysis is the first phase of a compiler. It takes the modified source code from language preprocessors that are written in the form of sentences. The lexical analyzer breaks these syntaxes into a series of tokens, by removing any whitespace or comments in the source code.
If the lexical analyzer finds a token invalid, it generates an error. The lexical analyzer works closely with the syntax analyzer. It reads character streams from the source code, checks for legal tokens, and passes the data to the syntax analyzer when it demands.
In programming language, keywords, constants, identifiers, strings, numbers, operators and punctuations symbols can be considered as tokens.
Activity Outcomes:
This lecture teaches you
- How to use regular expressions for pattern
- How to recognize operators from a source program written in a high level language
- How to recognize variables from a source program written in a high level language
Instructor Note:
Basics of C# should be known. Students should know how to write programs in C#.
Introduction
A regular expression is a pattern that describes a set of strings. Regular expressions are constructed analogously to arithmetic expressions by using various operators to combine smaller expressions.
The fundamental building blocks are the regular expressions that match a single character. Most characters, including all letters and digits, are regular expressions that match themselves. Any meta character with special meaning may be quoted by preceding it with a backslash. In basic regular expressions the metacharacters “?”, “+”, “{“, “|”, “(“, and “)” lose their special meaning; instead use the backslashed versions “\?”, “\+”, “\{“, “\|”, “\(“, and “\)”.
Activities:
Activity 1:
Implement Regular Expressions using RegEx class
Solution:
This example replaces extra white space:
using System;
using System.Text.RegularExpressions;
namespace RegExApplication
{
class Program
{
static void Main(string[] args)
{
string input = “Hello World “; string pattern = “\\s+”;
string replacement = ” “;
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
Console.WriteLine(“Original String: {0}”, input); Console.WriteLine(“Replacement String: {0}”, result); Console.ReadKey();
}
}
}
Activity 2:
Design regular expression for arithmetic operators: Regular Expression for operators: [+|*|/|-]
Solution:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Text.RegularExpressions;
namespace Sessional1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
String var = richTextBox1.Text; // take input from a richtextbox/textbox
String[] words = var.Split(‘ ‘); // split the input on the basis of space
Regex regex1 = new Regex(@”^[+|\-|*|/]$”);
for (int i = 0; i < words.Length; i++)
{
Match match1 = regex1.Match(words[i]);
if (match1.Success)
{
richTextBox2.Text += words[i] + ” “;
}
else {
MessageBox.Show(“invalid “+words[i]);
}
}
}
}
Activity 3:
Any meta character with special meaning may be quoted by preceding it with a backslash. In basic regular expressions the metacharacters “?”, “+”, “{“, “|”, “(“, and “)” lose their special meaning; instead use the backslashed versions “\?”, “\+”, “\{“, “\|”, “\(“, and “\)”.
Regular Expression for variables [A-Za-z]([A-Za-z|0-9])*
Design a regular expression for variables that should start with a letter, have a length not greater than 25 and can contain combination of digits and letters afterwards.
Solution:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Text.RegularExpressions;
namespace Sessional1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
String var = richTextBox1.Text; // take input from a richtextbox/textbox
String[] words = var.Split(‘ ‘); // split the input on the basis of space
Regex regex1 = new Regex(@”^[A-Za-z]|[A-Za-z|0-9]*{1-25}$”); // Regular Expression for variables
for (int i = 0; i < words.Length; i++)
{
Match match1 = regex1.Match(words[i]);
if (match1.Success)
{
richTextBox2.Text += words[i] + ” “;
}
else {
MessageBox.Show(“invalid “+words[i]);
}
}
}
}
Home Activities:
Activity 1:
Design regular expression for logical operators
Activity 2:
Design regular expression for relational operators:
Related Links
- Introduction to C#
- Lexical Analyzer Recognition of operators/variables
- Recognition of keywords/constants
- Lexical Analyzer Input Buffering scheme
- Symbol Table in Compiler Construction
- First set of a given grammar using Array
- Follow set of a given grammar using Array
- Bottom-up Parser-I DFA Implementation
- Bottom-up Parser-II Stack parser using SLR
- Semantic Analyzer
#Compiler Construction complete course # Compiler Construction past paper # Compiler Construction project #Computer Science all courses #University Past Paper #Programming language #Introduction to C# #Lexical Analyzer Recognition of operators/variables #Recognition of keywords/constants #Lexical Analyzer Input Buffering scheme #Symbol Table in Compiler Construction #First set of a given grammar using Array #Follow set of a given grammar using Array #Bottom-up Parser-I DFA Implementation #Bottom-up Parser-II Stack parser using SLR #Semantic Analyzer