Recognition of keywords/constants

Recognition of keywords/constants in Compiler Construction

Lexical analysis is the first phase of a compiler. It takes the modified source code from language preprocessors that are written in the form of sentences. The lexical analyzer breaks these syntaxes into a series of tokens, by removing any whitespace or comments in the source code.If the lexical analyzer finds a token invalid, it generates an error. The lexical analyzer works closely with the syntax analyzer. It reads character streams from the source code, checks for legal tokens, and passes the data to the syntax analyzer when it demands.

Lexical Analyzer
Lexical Analyzer

In programming language, keywords, constants, identifiers, strings, numbers, operators and punctuations symbols can be considered as tokens.

Activity Outcomes:

This lecture teaches you

  • How to recognize constants from a source program written in a high level
  • How to recognize keywords from a source program written in a high level language

Instructor Note:

Students should know how to write regular expressions in C#

Introduction

A regular expression is a pattern that describes a set of strings. Regular expressions are constructed analogously to arithmetic expressions by using various operators to combine smaller expressions.

The fundamental building blocks are the regular expressions that match a single character. Most characters, including all letters and digits, are regular expressions that match themselves. Any meta character with special meaning may be quoted by preceding it with a backslash. In basic regular expressions the metacharacters “?”, “+”, “{“, “|”, “(“, and “)” lose their special meaning; instead use the backslashed versions “\?”, “\+”, “\{“, “\|”, “\(“, and “\)”.

Activities:

Activity 1:

Design a regular expression for constants (digits plus floating point numbers):

Regular Expression for Constants:     [0-9]+((.[0-9]+)?([e][+|-][0-9]+)?)?

Solution:

using System;

using System.Collections.Generic;

using System.ComponentModel;

using System.Data;

using System.Drawing;

using System.Linq;

using System.Text;

using System.Threading.Tasks;

using System.Windows.Forms;

using System.Text.RegularExpressions;

namespace Sessional1

{

public partial class Form1 : Form

{

public Form1()

{

InitializeComponent();

}

private void button1_Click(object sender, EventArgs e)

{

String var = richTextBox1.Text;    // take input from a richtextbox/textbox

String[] words = var.Split(‘ ‘);       // split the input on the basis of space

Regex regex1 = new Regex(@”^[0-9][0-9]*(([.][0-9][0-9]*)?([e][+|-][0-9][0-9]*)?)?$”);   // Regular Expression for variables

for (int i = 0; i < words.Length; i++)

{

Match match1 = regex1.Match(words[i]);

if (match1.Success)

{

richTextBox2.Text += words[i] + ” “;

}

else {

MessageBox.Show(“invalid “+words[i]);

}

}

}

}

Activity 2:

Design a regular expression for keywords.

Regular Expression for keywords:     [int | float | double | char]

Solution:

using System;

using System.Collections.Generic;

using System.ComponentModel;

using System.Data;

using System.Drawing;

using System.Linq;

using System.Text;

using System.Threading.Tasks;

using System.Windows.Forms;

using System.Text.RegularExpressions;

namespace Sessional1

{

public partial class Form1 : Form

{

public Form1()

{

InitializeComponent();

}

private void button1_Click(object sender, EventArgs e)

{

String var = richTextBox1.Text;  // take input from a richtextbox/textbox

String[] words = var.Split(‘ ‘);    // split the input on the basis of space

Regex regex1 = new Regex(@”^[int | float | char]* $”);  // Regular Expression for variables

for (int i = 0; i < words.Length; i++)

{

Match match1 = regex1.Match(words[i]);

if (match1.Success)

{

richTextBox2.Text += words[i] + ” “;

}

else {

MessageBox.Show(“invalid “+words[i]);

}

}

}

}

Home Activities:

Activity 1:

Design a regular expression for floating point numbers having length not greater than 60.

Activity 2:

Design a single regular expression for following numbers:     8e4, 5e-2 , 6e9

Activity 3:

Design a regular expression for finding all the words starting with „t‟ and „m‟ in the following document

Related Links

#Compiler Construction complete course # Compiler Construction past paper # Compiler Construction project #Computer Science all courses  #University Past Paper #Programming language #Introduction to C# #Lexical Analyzer Recognition of operators/variables #Recognition of keywords/constants #Lexical Analyzer Input Buffering scheme #Symbol Table in Compiler Construction #First set of a given grammar using Array #Follow set of a given grammar using Array #Bottom-up Parser-I DFA Implementation #Bottom-up Parser-II Stack parser using SLR #Semantic Analyzer

Search within CuiTutorial

Scroll to Top