Tiny Great Languages: Assembly

This is part 1 from series “Tiny Great Languages”.

Decades ago, I wrote my first interpreter – a Turtle Graphics IDE designed to help kids at my school learn programming. I built it in Pascal, and to this day, I still wonder how I managed to make it work without any formal knowledge of writing interpreters.

Since then, I’ve created dozens of small domain specific languages (DSLs), humble compilers, and some absurdly tiny interpreters.

I once read a discussion on Hacker News where someone suggested that every programmer, over their lifetime, should implement six programming languages: an assembler, a version of BASIC, a flavour of Lisp, a Forth, and a subset of C. The specific languages aren’t as important as the families they represent. These are the Great Languages – each with a rich history that has fundamentally shaped modern programming in one way or another.

In this post I hope to re-implement these languages once more, keeping each implementation to around 50 lines of code. My goal is to highlight the unique characteristics of each language family while keeping the approach simple.

I’ll be using Python for these implementations, for a good balance between code density and readability. The only test program our interpreters would need to run is a factorial calculator, something we’ve all written many times (and something that can be implemented in different ways).

Assembly

One of the oldest programming languages in history is Assembly. Essentially, it’s a textual representation of CPU machine code instructions, with some helper directives to handle labels, constants, and sometimes even primitive macros.

We could target any architecture – whether a real CPU or a virtual machine – but since we’re using Python, why not build an assembler for the CPython VM itself?

Ironically, the CPython instruction set is somewhat unstable – Python 3.10 has difference in the bytecode instruction set from 3.9, and 3.11 has differences from 3.10 etc. Some instructions are being deprecated, removed, or replaced over time. Perhaps the easiest way to learn about current CPython bytecode is to disassemble a real function in a real VM:

import dis

def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n - 1)

dis.dis(factorial)

This will produce an output like:

  3           0 RESUME                   0
  4           2 LOAD_FAST                0 (n)
              4 LOAD_CONST               1 (0)
              6 COMPARE_OP              40 (==)
             10 POP_JUMP_IF_FALSE        1 (to 14)
  5          12 RETURN_CONST             2 (1)
  7     >>   14 LOAD_FAST                0 (n)
             16 LOAD_GLOBAL              1 (NULL + factorial)
             26 LOAD_FAST                0 (n)
             28 LOAD_CONST               2 (1)
             30 BINARY_OP               10 (-)
             34 CALL                     1
             42 BINARY_OP                5 (*)
             46 RETURN_VALUE

Each line here represents a single VM instruction, which typically takes up 2 bytes: one for the instruction opcode and one for the parameter. Parameters, like constants or variables, are referenced by their index and should be declared upfront, before assembling the bytecode. For instance, if we define the constants 1, 7, 42 for a function, we can use indices #1, #2, and #3 to refer to them in opcodes, such as LOAD_CONST. In this case LOAD_CONST 2 would essentially load number 7.

Next, we can print the actual bytecode of a function and match the numerical opcode values with the corresponding instructions:

print(" ".join(f"{byte:02x}" for byte in factorial.__code__.co_code))

97 00 7c 00 64 01 6b 28 00 00 72 01 79 02 7c 00 74 01 00 00 00 00 00 00 00 00 7c 00 64 02 7a 0a 00 00 ab 01 00 00 00 00 00 00 7a 05 00 00 53 00

Here, for example, we see that RESUME corresponds to 0x97, LOAD_FAST to 0x7c, LOAD_CONST to 0x64, COMPARE_OP to 0x6b, and RETURN_VALUE to 0x53.

There is also an unusual BINARY_OP which is a meta-instruction. Its parameter specifies the type of an arithmetic operation to perform. With some experimentation, we can discover their values: +:0, -:10, *:5, /:3. A keen eye might notice that BINARY_OP instructions are also 4 bytes long – the parameter is a 2-byte little-endian value followed by a zero.

Now, we’re ready to implement a toy assembler for the Python VM. We’ll keep the instruction names as Python defines them, but we’ll replace BINARY_OP with their actual operation names, like ADD, SUBTRACT, and so on. We would also need a JUMP instruction to implement a factorial. We’ll define any line containing a : as a label, and later, instructions like POP_JUMP_IF_FALSE mylabel will resolve to the correct label address. We’ll also allow comments by ignoring everything after the instruction name and its parameter. Lines starting with CONST will define constants, and lines starting with VAR will define local variable names.

The implementation would be a typical multi-pass assembler. We can’t build all the bytecode in one go, because we need to build a table of labels first and then assemble the instructions using the correct label addresses, otherwise forward declarations of labels wouldn’t work.

import types,dis,sys

# Some CPython VM opcodes, feel free to extend these
OPS = {'LOAD_CONST':100,'LOAD_FAST':124,'STORE_FAST':125,'RETURN_VALUE':83,'JUMP_FORWARD':110,'JUMP_BACKWARD':140,'POP_JUMP_IF_FALSE':114,'POP_JUMP_IF_TRUE':115,'BINARY_OP':122}

# BINARY_OP parameters for arithmetic operations
BINOPS = {'ADD':0,'SUBTRACT':10,'MULTIPLY':5,'DIVIDE':3}

# Two-pass assembler
def assemble(code):
     bytecode, consts, vars, labels = [], [], [], {}
     pos = 0
     # First pass: handle constants, variables and label addresses
     for line in code.split('\n'):
          parts = line.strip().split()
          if not parts: continue
          instr = parts[0]
          if instr.endswith(':'): labels[instr[:-1]] = pos
          elif instr == 'CONST': consts.append(int(parts[1]))
          elif instr == 'VAR': vars.append(parts[1])
          else: pos += 2 if instr not in BINOPS else 4
     # Second pass: translate instructions to bytecode 1:1 and fill in labels
     for line in code.split('\n'):
          parts = line.strip().split()
          if not parts: continue
          instr = parts[0]
          if instr.endswith(':') or instr == 'CONST' or instr == 'VAR': continue
          arg = parts[1] if len(parts) > 1 else 0
          # Replace labels with addresses
          if isinstance(arg, str) and arg in labels:
               if labels[arg] < len(bytecode): arg = (len(bytecode)-labels[arg])/2+1
               else: arg = (labels[arg]-len(bytecode))/2-1
          if instr in BINOPS:
              # BINARY_OP is a 4-byte command with a 2-byte parameter
              bytecode.append(OPS['BINARY_OP'])
              bytecode.append(BINOPS[instr]&255)
              bytecode.append(BINOPS[instr]>>8&255)
              bytecode.append(0)
          else:
              bytecode.append(OPS[instr])
              bytecode.append(int(arg))
     return tuple(consts), tuple(vars), bytes(bytecode)

code = ''.join([line for line in sys.stdin])
consts, v, bytecode = assemble(code)
code_obj = types.CodeType(0,0,0,len(v),128,64,bytecode,consts,(),v,'asm','mod','',1,b'',b'')
ff = types.FunctionType(code_obj, {})
# Optionally, disassemble the bytecode for debugging
# dis.dis(ff)
print(ff())

Here’s a simple program for our assembler that would load a constant 42 into a variable result and return it:

# Run it: python3 asm.py < 42.txt
# 42.txt:
CONST 42
VAR result
LOAD_CONST 0
RETURN_VALUE 0

A factorial program would be a little bit more complicated, but still fairly straightforward:

CONST 1
CONST 10
VAR n
VAR result

LOAD_CONST 0         # result = 1
STORE_FAST 0         # store in result
LOAD_CONST 1         # n = CONST[1] (change this to calculate factorial of different numbers)
STORE_FAST 1         # store in n
loop:
LOAD_FAST 1          # load n
POP_JUMP_IF_FALSE end  # if n == 0, jump to end
LOAD_FAST 0          # load result
LOAD_FAST 1          # load n
MULTIPLY 0           # result = result * n
STORE_FAST 0         # store result
LOAD_FAST 1          # load n
LOAD_CONST 0         # load 1
SUBTRACT 0           # n = n - 11
STORE_FAST 1         # store n
JUMP_BACKWARD loop   # jump to start of loop
end:
LOAD_FAST 0          # load result
RETURN_VALUE 0       # return result

Running it should calculate the factorial of 10:

$ python3 asm.py < factorial/fac.asm
3628800

That’s the correct answer!

Writing an assembler is a really straightforward task (just like the language itself), and if you are looking for an inspiratoin - you might try to build one for simple architectures, such as CHIP-8, 6502, Z80 etc.

In the next part we’ll move on to some higher-level languages, stay tuned!

I hope you’ve enjoyed this article. You can follow – and contribute to – on Github, Mastodon, Twitter or subscribe via rss.

Sep 08, 2024

See also: Post-apocalyptic programming and more.