Tuesday, 28 June 2016

Implementing a Programming Language in C - Part 1 - Lexer

In the previous blog post pertaining to implementing programming languages, I provided a sample of the syntax of the language:

In this blog post, we'll be going over how to split up this 'Tut' script into its constituent elements, or "tokens". This process is generally referred to as "Lexical Analysis" and it is done by what is called a "Lexer".
Let's begin!

Defining The Tokens

Before we can create our lexer, we must first define each and every valid token in the language. This can be done very easily with a C enum:


As you can see, every valid entity in the language is represented by one of these values.

The Lexer

The job of the lexer is to take a string containing "Tut" code and transform it into "TutToken" on demand. This is what the header file of the lexer looks like.

// TODO: Finish blog post
Here is the repository thus far.