Learn a language by writing too many Forths

Every developer eventually wants to explore new programming languages. As motivating as it gets, there’s always a question of finding a suitable project for that noble goal.

You can rewrite classical games, you can make a text editor, a UNIX shell or clones of traditional UNIX core utils. If you can afford it - build a web server, write an emulator, a ray tracer or even an operating system. To each his own.

I prefer having as little programming overhead as possible to rapidly get the impression of a new language. I don’t like googling for “best socket library in language X” or “top UI frameworks in 2022” as the first step into the language. Ecosystem details are quite important for production use, but newcomers often find such details distracting.

Thus, my “try the new language” exercise is: write as many Forths as you can in it.

Concatenative languages are conceptually simple and flexible, however implementing them covers lots of fundamental concepts, such as data structures (stacks, linked lists or hashmaps), algorithms (search, parsing) and it can get as complicated as you want. Choose your own adventure.


The first and the most trivial “forth-like” program to write is an RPN calculator. Many of us did that a number of times in their academy years. This silly exercise would help you to find the answers to:

At this point I don’t even bother with user input. I define a set of operations, typically + - * / % and a separate operation that puts a number on stack. Then I write a single rpn function that evaluates the stack. Vector of opcodes is my input, stack is my output. All validation happens in the tests.

Normally, an exercise takes just a few lines of code and can be accomplished in several minutes. Here’s my approach in Rust:

enum Op { Add, Sub, Mul, Div, Mod, Num(i32) }

fn rpn(code: &[Op]) -> Option<i32> {
    let mut stack = Vec::new();
    for op in code {
        match op {
            Op::Num(n) => stack.push(*n),
            _ => {
                if stack.len() < 2 {
                    return None;
                let b = stack.pop().unwrap();
                let a = stack.pop().unwrap();
                match op {
                    Op::Add => stack.push(a + b),
                    Op::Mul => stack.push(a * b),
                    Op::Sub => stack.push(a - b),
                    Op::Div => stack.push(a / b),
                    Op::Mod => stack.push(a % b),
                    _ => return None,
    return stack.pop();
fn test_rpn() {
		assert_eq!(rpn(&[Op::Num(2), Op::Num(3), Op::Add]), Some(5));
		assert_eq!(rpn(&[Op::Num(2), Op::Num(3), Op::Num(5), Op::Mul, Op::Add]), Some(17));
		assert_eq!(rpn(&[]), None);
		assert_eq!(rpn(&[Op::Add]), None);
		assert_eq!(rpn(&[Op::Num(3), Op::Add]), None);

Naturally, one can turn it into a CLI calculator similar to dc. But I usually switch to the proper Forth-like language with dictionaries and words instead.

Tiny Forth

Here the goal is more ambitious – to evaluate a string of words with the support of user-defined words. We don’t aim to implement a proper Forth with all the underlying dict-related primitives, we simply want : to define a new word and ; to terminate that definition.

This exercise involves some basic string tokenisation and lots of error handling. There’s more than one possible data type for a dictionary, hashmap is an obvious choice, but a vector or a list would also work.

My Rust solution is far from being perfect, but at least it’s small. It was also an eye-opening case on how convenient the ? operator is in Rust when it comes to error handling.

pub type Value = i32;
pub type Result<T> = std::result::Result<T, Error>;

#[derive(Debug, Copy, Clone)]
enum Token { Add, Sub, Mul, Div, Dup, Drop, Swap, Over, Num(i32), Word(usize) }

#[derive(Debug, Clone)]
struct Word { name: String, tokens: Vec<Token> }

pub struct Forth { stack: Vec<Value>, dict: Vec<Word> }

#[derive(Debug, PartialEq)]
pub enum Error { DivisionByZero, StackUnderflow, UnknownWord, InvalidWord }

impl Forth {
    pub fn new() -> Forth {
        Forth { stack: vec![], dict: vec![] }
    pub fn eval(&mut self, input: &str) -> Result<()> {
        let mut words = input.split(" ").map(|s| s.to_lowercase());
        while let Some(word) = words.next() {
            if word == ":" {
                self.new_word(&mut words)?
            } else {
    fn tokenise(&self, word: &str) -> Result<Token> {
        if let Some(idx) = self.find_word(&word) {
            return Ok(Token::Word(idx));
        let token = match word {
            "+" => Token::Add,
            "-" => Token::Sub,
            "*" => Token::Mul,
            "/" => Token::Div,
            "dup" => Token::Dup,
            "drop" => Token::Drop,
            "swap" => Token::Swap,
            "over" => Token::Over,
            w => match w.parse::<Value>() {
                Ok(n) => Token::Num(n),
                _ => return Err(Error::UnknownWord),
    fn apply(&mut self, token: Token) -> Result<()> {
        match token {
            Token::Add => { let a = self.pop()?; let b = self.pop()?; self.push(b + a); }
            Token::Sub => { let a = self.pop()?; let b = self.pop()?; self.push(b - a); }
            Token::Mul => { let a = self.pop()?; let b = self.pop()?; self.push(b * a); }
            Token::Div => {
                let a = self.pop()?;
                let b = self.pop()?;
                if a == 0 {
                    return Err(Error::DivisionByZero);
                self.push(b / a);
            Token::Dup => { let a = self.pick(0)?; self.push(a); }
            Token::Drop => { self.pop()?; }
            Token::Swap => { let a = self.pop()?; let b = self.pop()?; self.push(a); self.push(b); }
            Token::Over => { let b = self.pick(1)?; self.push(b); }
            Token::Num(n) => self.push(n),
            Token::Word(i) => {
                for t in self.dict[i].tokens.clone() {
    fn find_word(&self, name: &str) -> Option<usize> {
        self.dict.iter().enumerate().rfind(|(_, Word { name: n, .. })| *n == name).map(|(i, _)| i)
    fn new_word<T>(&mut self, words: &mut T) -> Result<()>
        T: Iterator<Item = String>,
        let name = words.next().ok_or(Error::InvalidWord)?;
        if name.parse::<Value>().is_ok() {
            return Err(Error::InvalidWord);
        let mut tokens = vec![];
        while let Some(word) = words.next() {
            if word == ";" {
                self.dict.push(Word { name, tokens });
                return Ok(());
    fn pop(&mut self) -> Result<Value> { self.stack.pop().ok_or(Error::StackUnderflow) }
    fn push(&mut self, value: Value) { self.stack.push(value) }
    fn pick(&mut self, i: usize) -> Result<Value> {
        let n = self.stack.len();
        if i < n {
            Ok(self.stack[n - i - 1])
        } else {
let mut f = Forth::new();
f.eval(": five 2 3 + ; five five *");
let result = f.pop().unwrap();

Actually, this toy project may become a starting point for something bigger – you’ve now built a core for a domain-specific language of your choice and can embed it into other applications as a scripting engine.


Now, let me open some esoteric and weird territories: meet FALSE. An esoteric language invented by Wouter van Ooermersen in 1993 to become as powerful as possible with a tiny implementation. The original compiler was only 1024 bytes and easily fit one screen.

However, the language is quite powerful - it supports lambda functions, variables, most of the arithmetic expressions, conditionals and loops, and some I/O support.

Implementing FALSE is very convenient for entry-level programmers because it doesn’t require a parser - each character is a separate command ready for execution. A complete language “specification” can be presented as follows:

      STACK     STACK           
{..}  -                                          { this is a comment }
[..]  -         function        [1+]             { (lambda (x) (+ x 1)) }
a..z  -         varadr          a                { use a: or a; }
A..Z  -         value of a..z   A                { [1+]a: 3 A! }
1..9  -         value           1                { 1 }
'c    -         value           'A               { 65 }
:     n,varadr  -               1a:              { a:=1 }
;     varadr    varvalue        a;               { a }
!     function  -               f;!              { f() }
+     n1,n1     n1+n2           1 2+             { 1+2 }
-     n1,n2     n1-n2           1 2-             { 1-2 }
*     n1,n2     n1*n2           1 2*             { 1*2 }
/     n1,n2     n1/n2           1 2/             { 1/2 }
_     n         -n              1_               { -1 }
<     n1,n2     n1<n2           3 2<             { 3<2 }
=     n1,n1     n1=n2           1 2=~            { 1<>2 }
>     n1,n2     n1>n2           1 2>             { 1>2 }
&     n1,n2     n1 and n2       1 2&             { 1&2 }
|     n1,n2     n1 or n2        1 2|             { 1|2 }
~     n         not n           0~               { 0,1 }
$     n         n,n             1$               { dupl. top stack }
%     n         -               1%               { del. top stack }
\     n1,n2     n2,n1           1 2\             { swap }
@     n,n1,n2   n1,n2,n         1 2 3@           { rot }
?     bool,fun  -               a;2=[1f;!]?      { if a=2 then f(1) }
#     bool,fun  -               1[$100<][1+]#    { while a<100 do a:=a+1 }
.     n         -               1.               { printnum(1) }
".."  -         string          "hi!"            { printstr("hi!") }
,     ch        -               10,              { putc(10) }
^     -         ch              ^                { getc() }

Values are integers, each command performs some action with the values on top of the stack. Not too different from our tiny Forth above, except for lambdas. Lambda is a piece of code within the square brackets. For example [1+] is a lambda that increments the top value on stack. To apply the lambda use !, i.e. 2[1+]! returns 3. It’s common to assign lambdas to some named variables and invoke them later: [1+]i: defines our increment lambda under name “i” and 2i;! applies it to the value of “2”.

For a compact language like this it’s very tempting to try out macros and some meta-programming. Some FALSE implementations are type-safe (so that an arbitrary integer on stack can’t be used as a variable name or a lambda), which would be also interesting to implement.

It’s not the first time I approach FALSE, so my implementation in Rust easily took ~250 LOC - here’s the gist.

The simplicity and the power of FALSE makes it a good candidate to explore low-level programming: try making a bootable OS image from a FALSE interpreter to get a “False OS”, akin to many BASIC systems from the past. Code is small, but possibilities are endless. Or maybe add some GUI and turn it into a False Haiku evaluator where one can define a pixel shader in FALSE. Or maybe spice it up with some DSP algorithms and turn it into a music programming system like StackBeat or Sporth.

Threading techniques

Finally, if you got to this point and still think you need to go deeper - you can check out how the assembly code is produced by the compiler (if you are studying a compiled language or the one with JIT).

Threaded code is a technique for implementing virtual machine interpreters. Say, your program consists of three instructions: pushA, pushB and add. We could write the following machine code to execute these instructions:

int *sp;

void pushA() { *sp++ = A; }
void pushB() { *sp++ = B; }
void add()   { int a = *sp--, *b = *sp--; *sp++ = a + b; }

pushA();  // asm: call pushA
pushB();  // asm: call pushB
add();    // asm: call add

This is called “subroutine-threaded code”, but it’s not a threaded code really. If we avoid the call instructions, however, we would get an array of code addresses. Now to execute such code we need to keep track of current instruction pointer and introduce a NEXT command that would jump from one instruction to the other:

typedef void (*op)();
op code[] = {pushA, pushB, add, NULL};
op *ip = code;
#define NEXT ((*ip++)())
while (ip) NEXT;

This technique is known as the “call threading”, it’s available in most of the programming languages but requires a call/ret instruction for each subroutine and is usually very slow.

Direct threaded code uses command addresses as opcodes, but those are not subroutines and instead of ret instruction at the end they call NEXT to jump to the next opcode. In C this might be implemented with a computed goto:

#define NEXT goto **ip++
void *code = { &&pushA, &&pushB, &&add, &&halt };
void **ip = code;
	*sp++ = A;
  *sp++ = B;
	int a = *sp--;
	int b = *sp--;
	*sp++ = a + b;

Direct threading is faster than subroutine calls, but Forth implementations usually prefer indirect threading, which is a similar technique but stores addresses of addresses of the words instead. The reason is that indirect threading is more compact, as it uses a single pointer to NEXT at the end of each operation instead of a number of assembly instructions performing the indirect jump with increment.

Another common, more portable dispatch technique is what we have been using in the exercises above - a switch/case statement. It requires branching and a jump instruction for the outer loop, which may impact the performance, also some languages perform bounds checking on switch slowing it down. But it remains the most popular VM technique nowadays:

int code[] = { PUSHA, PUSHB, ADD, HALT };
for (int i = 0; code[i] != HALT; i++) {
	switch (code[i]) {
		case PUSHA: ... break;
		case PUSHB: ... break;
		case ADD: ... break;

There is also a recursive continuation passing technique, where a subroutine never returns but invokes the next subroutine instead. It works well if the compiler uses tail-call optimisation and NEXT calls are replaced with jumps. This often results in the same machine code as computed goto but is supported by a broader circle of languages:

void NEXT() { code[++ip](); }
void pushA() { ...; NEXT(); }
void pushB() { ...; NEXT(); }
fn_t code[] = { pushA, pushB, add, halt };

Not every programming language supports all such techniques, and not every language results in the same performance for every of them. Rust doesn’t have computed goto but it supports calling functions by pointer, continuation passing optimises match statements pretty well.

My quick-n-dirty benchmarks in Rust have shown that subroutine calls are indeed 2x-3x slower, but a switch/match threading is as fast as continuation passing. At the same time match{} approach remains the shortest and the most readable code of them all.

I would probably go further and use inline assembly to implement threaded code, it would also be interesting to compare the performance on modern Intel CPUs and some 8-bit or 32-bit microcontrollers. Applying powerful Rust macros to unroll the switch statements could improve the performance even further, but course it would all imply reading lots of LLVM bytecode and exploring various compiler optimisations, which is also a way to learn new things.

Threaded code brings elegance, simplicity and improves code density. Once getting comfortable with threaded code one might try implementing a Forth in assembly from the ground up, but that’s a completely different story.

And what’s your preferred way of learning new languages?

I hope you’ve enjoyed this article. You can follow – and contribute to – on Github, Twitter or subscribe via rss.

Jul 05, 2022

See also: Making a tiny 2x3 bitmap font and more.