Write a Programming Language in a Weekend (Seriously) With Python



This content originally appeared on DEV Community and was authored by Developer Service

Title: Write a Programming Language in a Weekend (Seriously) With Python

Subtitle: Build a toy language from scratch and understand lexing, parsing, and interpreting — all in plain Python.

Introduction

Ever dreamt of creating your own programming language, but figured that was something only compiler geeks or professors could pull off?

Think again.

In this article, you’ll learn how to write your own toy programming language in a single weekend, using nothing but Python and a bit of brainpower. No compilers, no scary grammar tools, just regular Python code, a few re patterns, and a dose of curiosity.

You won’t be building the next JavaScript or Rust (yet), but you will build a working interpreter that can understand code like this:

let x = 10;
print(x * 2 + 1);

And the best part? You’ll understand how it works, from converting text into tokens, building an Abstract Syntax Tree (AST), and walking that tree to evaluate results. It’s like writing a mini-brain for your language, and it’s deeply satisfying.

Let’s get started. Your language awaits.

The full source code is available at: https://github.com/nunombispo/ProgrammingLanguage-Article

Step 1: Design Your Language

Before we write a single line of Python code for our new language interpreter, we need to answer a simple question:

What kind of language are we building?

We’re not aiming to replace Python or create a full-fledged compiler. Our goal is to create a simple, interpreted, expression-based language that supports:

  • Variable declarations using let
  • Basic arithmetic (+, -, *, /)
  • Built-in print() function
  • A script-style execution (no functions or conditionals, at least not yet)

Let’s review the steps necessary to create a language:

Programming Language Steps

In this step 1, we will take a look at the source code.

Syntax Design

Here’s the minimal syntax we’ll support:

let x = 5;
let y = x + 10;
print(y);

In English, this means:

  • Declare a variable x and set it to 5
  • Declare another variable y, set it to x + 10
  • Print the value of y

Each statement ends with a semicolon ;, similar to JavaScript or C.

Grammar Overview

To build a parser later, we’ll need a rough idea of the grammar. Here’s a simplified version:

program      ::= statement*
statement    ::= "let" IDENTIFIER "=" expression ";" 
               | "print" "(" expression ")" ";"
expression   ::= term (("+" | "-") term)*
term         ::= factor (("*" | "/") factor)*
factor       ::= NUMBER | IDENTIFIER | "(" expression ")"

This grammar:

  • Is written in EBNF-style notation (Extended Backus-Naur Form)
  • Defines how statements and expressions are structured
  • Handles operator precedence (i.e., * and / are evaluated before + and -)
  • Supports grouping with parentheses

Don’t worry if this looks unfamiliar. We’ll break this down step-by-step as we build the tokenizer, parser, and interpreter.

Just keep in mind that this grammar defines the structure of a programming language using basic constructs like variable assignment and printing.

Step 2: Tokenizer (Lexer)

Now that we’ve defined our language’s syntax, it’s time to build the first real component: a tokenizer, also known as a lexer.

Let’s review the steps necessary to create a language:

Programming Language Steps

In this step 2, we will take a look at the tokenizer.

What Is a Tokenizer?

A tokenizer breaks your source code (plain text) into a sequence of meaningful tokens, small labelled pieces like keywords, identifiers, numbers, and symbols.

For example, given this line of code:

let x = 5 + 2;

The tokenizer should return something like:

[
  ('LET', 'let'),
  ('IDENT', 'x'),
  ('EQUALS', '='),
  ('NUMBER', '5'),
  ('PLUS', '+'),
  ('NUMBER', '2'),
  ('SEMICOLON', ';')
]

These tokens make it easier for the parser (in step 3) to understand what’s going on.

Building the Tokenizer in Python

We’ll use Python’s built-in re (regular expressions) module to match patterns for each token type.

Let’s define the token types and write a simple lexer:

import re

# Define token types and regex patterns
TOKEN_TYPES = [
    ('LET',      r'let'),
    ('PRINT',    r'print'),
    ('NUMBER',   r'\d+'),
    ('IDENT',    r'[a-zA-Z_][a-zA-Z0-9_]*'),
    ('EQUALS',   r'='),
    ('PLUS',     r'\+'),
    ('MINUS',    r'-'),
    ('TIMES',    r'\*'),
    ('DIVIDE',   r'/'),
    ('LPAREN',   r'\('),
    ('RPAREN',   r'\)'),
    ('SEMICOLON',r';'),
    ('SKIP',     r'[ \t]+'),   # ignore spaces and tabs
    ('NEWLINE',  r'\n'),
]

Now let’s write the function to match and extract these tokens:

def tokenize(code):
    tokens = []
    index = 0

    while index < len(code):
        match = None
        for token_type, pattern in TOKEN_TYPES:
            regex = re.compile(pattern)
            match = regex.match(code, index)
            if match:
                text = match.group(0)
                if token_type != 'SKIP' and token_type != 'NEWLINE':
                    tokens.append((token_type, text))
                index = match.end(0)
                break
        if not match:
            raise SyntaxError(f'Unexpected character: {code[index]}')
    return tokens

Example

Let’s test it:

code = "let x = 5 + 2;"
print(tokenize(code))

Output:

[('LET', 'let'), ('IDENT', 'x'), ('EQUALS', '='), ('NUMBER', '5'), ('PLUS', '+'), ('NUMBER', '2'), ('SEMICOLON', ';')]

You’ve got a working tokenizer!

Step 3: Building a Parser (AST Generator)

Now that we can tokenize our code, it’s time to make sense of those tokens. This is where the parser comes in.

Let’s review the steps necessary to create a language:

Programming Language Steps

In this step 3, we will take a look at the parser and AST.

What Is a Parser?

A parser reads the list of tokens and builds an Abstract Syntax Tree (AST), which is a structured, hierarchical representation of the code.

Take this input:

let x = 5 + 2;

The tokenizer gives us:

[('LET', 'let'), ('IDENT', 'x'), ('EQUALS', '='), ('NUMBER', '5'), ('PLUS', '+'), ('NUMBER', '2'), ('SEMICOLON', ';')]

The parser turns this into an AST like:

[
    LetStatement(
        name="x",
        value=BinaryOp(
            left=Number(value=5),
            op="+",
            right=Number(value=2)
        )
    ),
    PrintStatement(
        expr=Identifier(name="x")
    )
]

Let’s build that.

Define AST Nodes

We’ll define a few Python classes to represent different AST node types:

class Number:
    def __init__(self, value):
        self.value = int(value)

    def __repr__(self):
        return f"Number(value={self.value})"

class Identifier:
    def __init__(self, name):
        self.name = name

    def __repr__(self):
        return f"Identifier(name={self.name})"

class BinaryOp:
    def __init__(self, left, op, right):
        self.left = left
        self.op = op
        self.right = right

    def __repr__(self):
        return f"BinaryOp(left={self.left}, op={self.op}, right={self.right})"

class LetStatement:
    def __init__(self, name, value):
        self.name = name
        self.value = value

    def __repr__(self):
        return f"LetStatement(name={self.name}, value={self.value})"

class PrintStatement:
    def __init__(self, expr):
        self.expr = expr

    def __repr__(self):
        return f"PrintStatement(expr={self.expr})"

Create the Parser Class

We’ll make a simple recursive descent parser that consumes tokens one by one and builds AST nodes.

class Parser:
    def __init__(self, tokens):
        self.tokens = tokens
        self.pos = 0

    def current(self):
        return self.tokens[self.pos] if self.pos < len(self.tokens) else ('EOF', '')

    def eat(self, token_type):
        if self.current()[0] == token_type:
            self.pos += 1
        else:
            raise SyntaxError(f'Expected {token_type}, got {self.current()}')

    def parse(self):
        statements = []
        while self.current()[0] != 'EOF':
            if self.current()[0] == 'LET':
                statements.append(self.parse_let())
            elif self.current()[0] == 'PRINT':
                statements.append(self.parse_print())
            else:
                raise SyntaxError(f'Unexpected token: {self.current()}')
        return statements

Parse let and print Statements

    def parse_let(self):
        self.eat('LET')
        name = self.current()[1]
        self.eat('IDENT')
        self.eat('EQUALS')
        expr = self.parse_expression()
        self.eat('SEMICOLON')
        return LetStatement(name, expr)

    def parse_print(self):
        self.eat('PRINT')
        self.eat('LPAREN')
        expr = self.parse_expression()
        self.eat('RPAREN')
        self.eat('SEMICOLON')
        return PrintStatement(expr)

Parse Expressions (with Operator Precedence)

    def parse_expression(self):
        node = self.parse_term()
        while self.current()[0] in ('PLUS', 'MINUS'):
            op = self.current()[1]
            self.eat(self.current()[0])
            right = self.parse_term()
            node = BinaryOp(node, op, right)
        return node

    def parse_term(self):
        node = self.parse_factor()
        while self.current()[0] in ('TIMES', 'DIVIDE'):
            op = self.current()[1]
            self.eat(self.current()[0])
            right = self.parse_factor()
            node = BinaryOp(node, op, right)
        return node

    def parse_factor(self):
        token_type, token_value = self.current()
        if token_type == 'NUMBER':
            self.eat('NUMBER')
            return Number(token_value)
        elif token_type == 'IDENT':
            self.eat('IDENT')
            return Identifier(token_value)
        elif token_type == 'LPAREN':
            self.eat('LPAREN')
            expr = self.parse_expression()
            self.eat('RPAREN')
            return expr
        else:
            raise SyntaxError(f'Unexpected factor: {self.current()}')

Test It

code = """
let x = 5 + 2;
print(x);
"""

from pprint import pprint
from tokenizer import tokenize

tokens = tokenize(code)
parser = Parser(tokens)
ast = parser.parse()

pprint(ast)

You should see a structured tree of LetStatement and PrintStatement nodes, like this:

[LetStatement(name=x, value=BinaryOp(left=Number(value=5), op=+, right=Number(value=2))), PrintStatement(expr=Identifier(name=x))]

Let’s beautify it for readability:

[
    LetStatement(
        name="x",
        value=BinaryOp(
            left=Number(value=5),
            op="+",
            right=Number(value=2)
        )
    ),
    PrintStatement(
        expr=Identifier(name="x")
    )
]

This is exactly what your interpreter will need next.

Step 4: Evaluating the AST (Running Your Language)

You’ve built a tokenizer and a parser that gives you an abstract syntax tree (AST). Now it’s time to execute that tree, just like a real programming language does.

Let’s review the steps necessary to create a language:

Programming Language Steps

In this step 4, we will take a look at the interpreter, and the output.

Interpreter Basics

An interpreter is a component that:

  1. Walks the AST.
  2. Evaluates each node.
  3. Keeps track of variables (in memory).
  4. Produces side effects (like printing output).

The Environment

We need a place to store variable values:

class Environment:
    def __init__(self):
        self.vars = {}

    def set_var(self, name, value):
        self.vars[name] = value

    def get_var(self, name):
        if name in self.vars:
            return self.vars[name]
        raise NameError(f"Variable '{name}' not defined")

The Interpreter

We’ll walk through each statement and expression recursively.

class Interpreter:
    def __init__(self):
        self.env = Environment()

    def eval(self, node):
        if isinstance(node, Number):
            return node.value

        elif isinstance(node, Identifier):
            return self.env.get_var(node.name)

        elif isinstance(node, BinaryOp):
            left = self.eval(node.left)
            right = self.eval(node.right)
            if node.op == '+':
                return left + right
            elif node.op == '-':
                return left - right
            elif node.op == '*':
                return left * right
            elif node.op == '/':
                return left // right  # integer division
            else:
                raise RuntimeError(f"Unknown operator: {node.op}")

        elif isinstance(node, LetStatement):
            value = self.eval(node.value)
            self.env.set_var(node.name, value)

        elif isinstance(node, PrintStatement):
            value = self.eval(node.expr)
            print(value)

        else:
            raise RuntimeError(f"Unknown node: {node}")

Running It All Together

code = """
let a = 10;
let b = a + 20 * 2;
print(b);
"""

tokens = tokenize(code)
pprint(tokens)

parser = Parser(tokens)
ast = parser.parse()
pprint(ast)

interpreter = Interpreter()
for stmt in ast:
    interpreter.eval(stmt)

Output

First, it will output the tokens:

[('LET', 'let'),
 ('IDENT', 'a'),
 ('EQUALS', '='),
 ('NUMBER', '10'),
 ('SEMICOLON', ';'),
 ('LET', 'let'),
 ('IDENT', 'b'),
 ('EQUALS', '='),
 ('IDENT', 'a'),
 ('PLUS', '+'),
 ('NUMBER', '20'),
 ('TIMES', '*'),
 ('NUMBER', '2'),
 ('SEMICOLON', ';'),
 ('PRINT', 'print'),
 ('LPAREN', '('),
 ('IDENT', 'b'),
 ('RPAREN', ')'),
 ('SEMICOLON', ';')]

Then the AST:

[LetStatement(name=a, value=Number(value=10)),
 LetStatement(name=b, value=BinaryOp(left=Identifier(name=a), op=+, right=BinaryOp(left=Number(value=20), op=*, right=Number(value=2)))),
 PrintStatement(expr=Identifier(name=b))]

And finally the output:

50

And there it is.

Your language interpreted and executed code written in a custom syntax.

In a weekend. With Python.

What’s Next?

Here are a few ideas to expand your language:

  • Add if statements and comparison operators (==, <, >)
  • Add functions with arguments and return values
  • Create a REPL (Read-Eval-Print Loop) for interactive coding
  • Build a small standard library (e.g., input(), len(), etc.)
  • Export your language as a CLI tool or package

Conclusion

Building your own programming language might sound intimidating, but now you’ve done it.

You’ve walked through every piece of the puzzle using pure Python.

This is just the beginning. Language design is a deep, fascinating field.

But you’ve proven you can go from zero to interpreter in a weekend.

Now go forth and build something weird, fun, and 100% yours.

Follow me on Twitter: https://twitter.com/DevAsService

Follow me on Instagram: https://www.instagram.com/devasservice/

Follow me on TikTok: https://www.tiktok.com/@devasservice

Follow me on YouTube: https://www.youtube.com/@DevAsService


This content originally appeared on DEV Community and was authored by Developer Service