Emulating 6502

Last year, I noticed that routine programming no longer brought me the same joy it did decades ago. So, my New Year’s resolution was to engage in more “useless” programming – coding small, fun projects without any specific end goal in mind. Of all the possible topics, the one that captivated me throughout the year was retrocomputing.

I didn’t have a computer until I was 14, but I remember seeing grey 8-bit machines at school or friends' houses. I saw BASIC and played blocky games on monochrome TVs. So, I felt the need to revisit those times and learn more about how things worked back then.

Over the year I visited some retrocomputing festivals and museums, built a plywood arcade cabinet with a kid, and worked with a friend to design a small handheld monochrome gaming console, complete with schematics, PCB, and firmware!

But for me, the retrocomputing journey began with the MOS 6502.

6502

MOS6502 is a brilliant little CPU that every geek should try emulating or programming. It’s at the core of nearly every 8-bit retro computer: Atari 2600, Apple I, Apple II, NES, Commodore 64, VIC-20, Atari Lynx, BBC Micro – the list is endless.

6502 features a simple yet powerful instruction set. With only 56 opcodes, it wouldn’t take more than a day to implement all of them.

In this article, we’ll emulate MOS 6502 in Python to run Apple I ROMs. But first, let’s explore a few core concepts.

MOS6502 has just three general-purpose registers: A (accumulator), X, and Y (index registers). Additionally, there’s a flag register (F), a stack register (S), and a program counter (PC). The CPU supports 16-bit memory addresses, meaning it can handle up to 64KB of RAM. To exceed this limit, you’d need to use bank switching, dynamically swapping memory regions to map the same 16-bit address to different memory regions.

The 6502’s memory layout is somewhat unusual by modern standards:

┌───────────┐         
│ IRQ addr  │ 0xffff  
│ RST addr  │ 0xfffd  
│ NMI addr  │ 0xfffb  
├───────────┤         
│ ROM code  │         
├───────────┤         
│ I/O regs  │ 0xd000  
├───────────┤         
│ Free RAM  │         
├───────────┤         
│ Stack     │ 0x0100  
├───────────┤         
│ Zero Page │ 0x0000  
└───────────┘         

Interrupt vectors are located at the end of the address space. These vectors point to the reset (RST) handler, IRQ handler, and non-maskable interrupt (NMI) handler. Each handler address is two bytes long. The RST handler is where the “OS” code begins execution. IRQ and NMI are used to handle hardware interactions; the difference is that IRQ can be programmatically disabled (masked), while NMI cannot.

The rest of the address space varies between computers, but generally includes ROM (or EEPROM), which stores the “monitor” program or OS BASIC interpreter code. There are also memory-mapped I/O areas for peripherals like timers, keyboards, displays, cassette readers etc. Each byte in these areas has a specific purpose, with reads and writes triggering hardware actions.

Finally, there’s RAM. Most 8-bit computers of the time had 1KB to 16KB of RAM. Of this, the first 256 bytes (known as the “zero page”) and the next 256 bytes (reserved for the CPU stack) had special purposes.

With this understanding, we’re ready to start writing our emulator. First, we’ll need helpers to handle 16-bit unsigned integers and convert between types. The 6502 CPU operates on a memory bus, which we’ll implement as a list-like object supporting __setitem__ and __getitem__ for accessing individual bytes at specific addresses. This bus will handle memory-mapped I/O, interrupt vectors, ROM, and RAM, appearing as “memory” from the CPU’s perspective.

Next, we’ll write helpers for memory operations: reading single or double bytes, pushing and popping values to/from the stack. MOS6502 is little-endian, and since the stack grows by decrementing the pointer, higher bytes must be pushed first.

# General helpers for u8 and u16 data types
def u16(n): return n & 0xFFFF
def u8(n): return n & 0xFF
def to16(lo, hi): return lo + hi * 0x100

class MOS6502:
    def __init__(self, bus):
        self.bus = bus
        self.a, self.x, self.y = 0, 0, 0
        self.s, self.f, self.pc = 0xFD, 0x20, to16(bus[0xFFFC], bus[0xFFFD])
    def fetch(self, offset): return self.bus[u16(self.pc + offset)]
    def fetch16(self, offset): return to16(self.fetch(offset), self.fetch(u16(offset + 1)))
    def push(self, value): self.bus[self.s | 0x100] = value; self.s = u8(self.s - 1)
    def pop(self): self.s = u8(self.s + 1); return self.bus[self.s | 0x100]
    def push16(self, value): self.push(u8(value >> 8)); self.push(u8(value))
    def pop16(self): return to16(self.pop(), self.pop())

Addressing Modes

One of the most unusual aspects of the 6502 is its addressing modes, which dictate how the CPU fetches operands for each instruction.

It may sound confusing, but it’s not, and here’s some short code to prove it. It implements all of the 6502 addressing modes:

def zp16(self, addr): return to16(self.bus[addr], self.bus[u16(addr + 1)])

def addr(self, mode):
    if mode == ABS: return to16(self.fetch(1), self.fetch(2))
    elif mode == ABX: return u16(self.addr(ABS) + self.x) 
    elif mode == ABY: return u16(self.addr(ABS) + self.y) 
    elif mode == IMM: return u16(self.pc + 1)
    elif mode == IZX: return self.zp16(u8(self.fetch(1) + self.x))
    elif mode == IZY: return u16(self.zp16(self.fetch(1)) + self.y)
    elif mode == REL: return (self.fetch(1) ^ 0x80) - 0x80
    elif mode == ZPG: return self.fetch(1)
    elif mode == ZPX: return u8(self.fetch(1) + self.x)
    elif mode == ZPY: return u8(self.fetch(1) + self.y)
    elif mode == IND:
        addr = self.fetch16(1);
        return to16(self.bus[addr], self.bus[(addr & 0xff00) | u8(addr + 1)])

There is an additional “accumulator” mode not listed here. It means the operand is not in memory, but in the A register (which has no “address”, so we treat it differently when reading/writing bytes):

def read(self, mode): return self.a if mode == ACC else self.bus[self.addr(mode)]
def write(self, mode, value):
    if mode == ACC: self.a = value
    else: self.bus[self.addr(mode)] = value

Instructions

6502 has the following instruction groups:

Not too many, and each of the is encoded as an individual byte value followed by operands. The mode is determined by the instruction byte, i.e. LDA can be encoded as 8 different byte values, depending on the operand mode.

Each instruction takes 0-3 bytes for its operands. To handle this, we’ll use a lookup table to determine the instruction length and increment the PC accordingly:

OPSZ = [1, 2, 0, 0, 0, 2, 2, 0, 1, 2, 1, 0, 0, 3, 3, 0, 2, 2, 0, 0, 0, 2, 2, 0, 1, 3, 0, 0, 0, 3, 3, 0, 3, 2, 0, 0, 2, 2, 2, 0, 1, 2, 1, 0, 3, 3, 3, 0, 2, 2, 0, 0, 0, 2, 2, 0, 1, 3, 0, 0, 0, 3, 3, 0, 1, 2, 0, 0, 0, 2, 2, 0, 1, 2, 1, 0, 3, 3, 3, 0, 2, 2, 0, 0, 0, 2, 2, 0, 1, 3, 0, 0, 0, 3, 3, 0, 1, 2, 0, 0, 0, 2, 2, 0, 1, 2, 1, 0, 3, 3, 3, 0, 2, 2, 0, 0, 0, 2, 2, 0, 1, 3, 0, 0, 0, 3, 3, 0, 0, 2, 0, 0, 2, 2, 2, 0, 1, 0, 1, 0, 3, 3, 3, 0, 2, 2, 0, 0, 2, 2, 2, 0, 1, 3, 1, 0, 0, 3, 0, 0, 2, 2, 2, 0, 2, 2, 2, 0, 1, 2, 1, 0, 3, 3, 3, 0, 2, 2, 0, 0, 2, 2, 2, 0, 1, 3, 1, 0, 3, 3, 3, 0, 2, 2, 0, 0, 2, 2, 2, 0, 1, 2, 1, 0, 3, 3, 3, 0, 2, 2, 0, 0, 0, 2, 2, 0, 1, 3, 0, 0, 0, 3, 3, 0, 2, 2, 0, 0, 2, 2, 2, 0, 1, 2, 1, 0, 3, 3, 3, 0, 2, 2, 0, 0, 0, 2, 2, 0, 1, 3, 0, 0, 0, 3, 3, 0]

We’ll now write the core of our emulator, which fetches and executes instructions step by step:

def step(self):
    op = self.fetch(0)
    next_pc = self.pc + OPSZ[op]
    if op == 0x00: pass   # BRK,IMP,1,7,cziDBvn
    elif op == 0x20: pass # JSR,ABS,3,6,czidbvn
    elif op == 0x69: pass # ADC,IMM,2,2,CZidbVN
    # TODO: implement every instruction here
    self.pc = u16(next_pc)

We fetch instruction’s first byte and prepare to advance PC. Then we’ll have a giant if/else block for all the supported opcodes (151 of them). In the comments for each opcode I describe instruction name, addressing mode, number of bytes, number of CPU cycles and the CPU flags that it may affect.

But before we start implementing all the instructions, let’s implement a proper memory bus from a real computer.

Apple I

To test our emulator, we’ll use the Apple I computer, Apple’s first product from 1976. It featured 4KB or 8KB of RAM and a monitor program (WozMon) packed into just 256 bytes of machine code. Apple I supported Integer BASIC, a 40×24 text terminal, and cassette data storage. Not much, and it only sold 200 units, but it’s still a remarkable piece of engineering.

Memory-mapped I/O for Apple I includes:

We’ll load the WozMon binary into ROM at 0xFF00–0xFFFF:

import base64
WOZMON = base64.b64decode('2Figf4wS0KmnjRHQjRPQyd/wE8mb8APIEA+p3CDv/6mNIO//oAGIMPatEdAQ+60Q0JkAAiDv/8mN0NSg/6kAqgqFK8i5AALJjfDUya6Q9PDwybrw68nS8DuGKIYphCq5AAJJsMkKkAZpiMn6kBEKCgoKogQKJigmKcrQ+MjQ4MQq8JckK1AQpSiBJuYm0LXmJ0xE/2wkADArogK1J5UllSPK0PfQFKmNIO//pSUg3P+lJCDc/6m6IO//qaAg7/+hJCDc/4YrpSTFKKUl5SmwweYk0ALmJaUkKQcQyEhKSkpKIOX/aCkPCbDJupACaQYsEtAw+40S0GAAAAAPAP8AAA==')

class AppleI:
  def __init__(self):
    self.keys = []
    self.mem = bytearray(0x2000)
    self.wozmon = WOZMON
  def __getitem__(self, i):
    if i >= 0 and i <= 0x2000: return self.mem[i] # 8K RAM
    elif i >= 0xff00 and i <= 0xffff: return self.wozmon[i - 0xff00]
    elif i == 0xd010: return self.keys.pop(0)|0x80 if len(self.keys) else 0x80
    elif i == 0xd011: return 0x80 if len(self.keys) else 0
    else: return 0
  def __setitem__(self, i, n):
    if i >= 0 and i < 0x0fff: self.mem[i] = n
    elif i == 0xd012:
      if n & 0x7f == 0x0d: print('')
      elif n & 0x7f == 0x5f: print('\b', end = '', flush=True)
      else: print(chr(n&0x7f), end='', flush=True)
  def send_key(self, c):
    self.keys.append(c)

a1 = AppleI()
a1[0xd012] = 65; # 'A'
a1[0xd012] = 49; # '1'
a1[0xd012] = 13; # '\r'
for i in range(256): print(f"{a1[0xff00+i]:x}", end=' ')

Sending 3 bytes to the bus should print “A1” (note that Apple used CR=13 instead of NL=10).

Then we display machine codes of the WozMon:

d8 58 a0 7f 8c 12 d0 a9 a7 8d 11 d0 8d 13 d0 c9 df f0 13 c9 9b f0 03 c8 10 0f a9 dc 20 ef ff a9 8d 20 ef ff a0 01 88 30 f6 ad 11 d0 10 fb ad 10 d0 99 00 02 20 ef ff c9 8d d0 d4 a0 ff a9 00 aa 0a 85 2b c8 b9 00 02 c9 8d f0 d4 c9 ae 90 f4 f0 f0 c9 ba f0 eb c9 d2 f0 3b 86 28 86 29 84 2a b9 00 02 49 b0 c9 0a 90 06 69 88 c9 fa 90 11 0a 0a 0a 0a a2 04 0a 26 28 26 29 ca d0 f8 c8 d0 e0 c4 2a f0 97 24 2b 50 10 a5 28 81 26 e6 26 d0 b5 e6 27 4c 44 ff 6c 24 00 30 2b a2 02 b5 27 95 25 95 23 ca d0 f7 d0 14 a9 8d 20 ef ff a5 25 20 dc ff a5 24 20 dc ff a9 ba 20 ef ff a9 a0 20 ef ff a1 24 20 dc ff 86 2b a5 24 c5 28 a5 25 e5 29 b0 c1 e6 24 d0 02 e6 25 a5 24 29 07 10 c8 48 4a 4a 4a 4a 20 e5 ff 68 29 0f 09 b0 c9 ba 90 02 69 06 2c 12 d0 30 fb 8d 12 d0 60 00 00 00 0f 00 ff 00 00

The emulator begins execution at the reset vector (0xFFFC), starting with the CLD (0xd8) and CLI (0x58) instructions. Both operate on flags.

Flags

The flag register is an 8-bit register where only seven bits are utilized (NV1BDIZC).

Instructions to set/clear individual flags use implicit addressing and require no arguments, simplifying implementation:

elif op == 0x18: self.f = self.f & 0xFE  # CLC,IMP,1,2,Czidbvn
elif op == 0xD8: self.f = self.f & 0xF7  # CLD,IMP,1,2,cziDbvn
elif op == 0x58: self.f = self.f & 0xFB  # CLI,IMP,1,2,czIdbvn
elif op == 0xB8: self.f = self.f & 0xBF  # CLV,IMP,1,2,czidbVn
elif op == 0x38: self.f = self.f | 0x01  # SEC,IMP,1,2,Czidbvn
elif op == 0xF8: self.f = self.f | 0x08  # SED,IMP,1,2,cziDbvn
elif op == 0x78: self.f = self.f | 0x04  # SEI,IMP,1,2,czIdbvn

We can also add a helper to set N/Z flags, depending on the result:

def nz(self, x):
    self.f = (self.f & 0x7D) | ((int(x == 0)) << 1) | (x & 0x80)
    return x

Here we set N if the 8th bit of the result is set. We set Z bit if the result is zero.

There is one instruction left in this family, BIT. It fetches a memory byte and updates the flags:

def flags(self, mode):
    v = self.read(mode)
    self.f = (self.f & 0x3D) | (0xC0 & v) | ((int((v & self.a) == 0)) << 1)

...
elif op == 0x24: self.flags(ZPG),  # BIT,ZP,2,3,cZidbVN
elif op == 0x2C: self.flags(ABS),  # BIT,ABS,3,4,cZidbVN

Despite all the oddities, BIT is a useful instruction to check several bits at once without touching any of the A/X/Y registers.

Load and store

Executing two instructions of WozMon we’ve disabled BCD mode and masked the interrupts.

Our third instruction is A0 7F, or LDY #$7F. It tells the CPU to load value 0x7f into Y. We can implement all LD/ST instructions at once, they are all the same except for the addressing mode:

elif op == 0xA9: self.a = self.nz(self.read(IMM))  # LDA,IMM,2,2,cZidbvN
elif op == 0xA5: self.a = self.nz(self.read(ZPG))  # LDA,ZP,2,3,cZidbvN
elif op == 0xB5: self.a = self.nz(self.read(ZPX))  # LDA,ZPX,2,4,cZidbvN
elif op == 0xAD: self.a = self.nz(self.read(ABS))  # LDA,ABS,3,4,cZidbvN
elif op == 0xBD: self.a = self.nz(self.read(ABX))  # LDA,AX,3,4,cZidbvN
elif op == 0xB9: self.a = self.nz(self.read(ABY))  # LDA,AY,3,4,cZidbvN
elif op == 0xA1: self.a = self.nz(self.read(IZX))  # LDA,ZIX,2,6,cZidbvN
elif op == 0xB1: self.a = self.nz(self.read(IZY))  # LDA,ZIY,2,5,cZidbvN
elif op == 0xA2: self.x = self.nz(self.read(IMM))  # LDX,IMM,2,2,cZidbvN
elif op == 0xA6: self.x = self.nz(self.read(ZPG))  # LDX,ZP,2,3,cZidbvN
elif op == 0xB6: self.x = self.nz(self.read(ZPY))  # LDX,ZPY,2,4,cZidbvN
elif op == 0xAE: self.x = self.nz(self.read(ABS))  # LDX,ABS,3,4,cZidbvN
elif op == 0xBE: self.x = self.nz(self.read(ABY))  # LDX,AY,3,4,cZidbvN
elif op == 0xA0: self.y = self.nz(self.read(IMM))  # LDY,IMM,2,2,cZidbvN
elif op == 0xA4: self.y = self.nz(self.read(ZPG))  # LDY,ZP,2,3,cZidbvN
elif op == 0xB4: self.y = self.nz(self.read(ZPX))  # LDY,ZPX,2,4,cZidbvN
elif op == 0xAC: self.y = self.nz(self.read(ABS))  # LDY,ABS,3,4,cZidbvN
elif op == 0xBC: self.y = self.nz(self.read(ABX))  # LDY,AX,3,4,cZidbvN
elif op == 0x85: self.write(ZPG, self.a)  # STA,ZP,2,3,czidbvn
elif op == 0x95: self.write(ZPX, self.a)  # STA,ZPX,2,4,czidbvn
elif op == 0x8D: self.write(ABS, self.a)  # STA,ABS,3,4,czidbvn
elif op == 0x9D: self.write(ABX, self.a)  # STA,AX,3,5,czidbvn
elif op == 0x99: self.write(ABY, self.a)  # STA,AY,3,5,czidbvn
elif op == 0x81: self.write(IZX, self.a)  # STA,ZIX,2,6,czidbvn
elif op == 0x91: self.write(IZY, self.a)  # STA,ZIY,2,6,czidbvn
elif op == 0x86: self.write(ZPG, self.x)  # STX,ZP,2,3,czidbvn
elif op == 0x96: self.write(ZPY, self.x)  # STX,ZPY,2,4,czidbvn
elif op == 0x8E: self.write(ABS, self.x)  # STX,ABS,3,4,czidbvn
elif op == 0x84: self.write(ZPG, self.y)  # STY,ZP,2,3,czidbvn
elif op == 0x94: self.write(ZPX, self.y)  # STY,ZPX,2,4,czidbvn
elif op == 0x8C: self.write(ABS, self.y)  # STY,ABS,3,4,czidbvn

ST operations copy contents from registers A/X/Y into memory, while LD operations copy from memory into A/X/Y and set N/Z flags.

Next WozMon instructions to run are 0x8C (STY), 0xA9 (LDA), 0x8D (STA) and another 0x8D (STA). All of them are implemented above.

So far we’ve done 39 opcodes out of 151. If you now run the emulator for 7 steps and then print the contents of all registers – you will get A7 00 7F FD, which corresponds to the expected results from the first instructions of WozMon:

RESET: CLD             ; Clear decimal arithmetic mode.
       CLI
       LDY #$7F        ; Mask for DSP data direction register.
       STY DSP         ; Set it up.
       LDA #$A7        ; KBD and DSP control register mask.
       STA KBDCR       ; Enable interrupts, set CA1, CB1, for
       STA DSPCR       ;  positive edge sense/output mode.

The next instruction to implement would be 0xC9 or CMP.

Branches

CMP instruction compares a register A/X/Y to memory and updates:

def cmp(self, mode, n):
    m = self.read(mode)
    self.f = (self.f & 0xFE) | (int(n >= m))
    self.nz(u8(n - m))

Conditional branching in 6502 depends on the specific bit values from the flags. If the flag matches the branching condition - PC is increased by the signed byte value stored after the PC. Otherwise PC remains unchanged, pointing to the next instruction in code:

branch = lambda cond: self.addr(REL) if cond else 0
...
elif op == 0xC9: self.cmp(IMM, self.a)  # CMP,IMM,2,2,CZidbvN
elif op == 0xC5: self.cmp(ZPG, self.a)  # CMP,ZP,2,3,CZidbvN
elif op == 0xD5: self.cmp(ZPX, self.a)  # CMP,ZPX,2,4,CZidbvN
elif op == 0xCD: self.cmp(ABS, self.a)  # CMP,ABS,3,4,CZidbvN
elif op == 0xDD: self.cmp(ABX, self.a)  # CMP,AX,3,4,CZidbvN
elif op == 0xD9: self.cmp(ABY, self.a)  # CMP,AY,3,4,CZidbvN
elif op == 0xC1: self.cmp(IZX, self.a)  # CMP,ZIX,2,6,CZidbvN
elif op == 0xD1: self.cmp(IZY, self.a)  # CMP,ZIY,2,5,CZidbvN
elif op == 0xE0: self.cmp(IMM, self.x)  # CPX,IMM,2,2,CZidbvN
elif op == 0xE4: self.cmp(ZPG, self.x)  # CPX,ZP,2,3,CZidbvN
elif op == 0xEC: self.cmp(ABS, self.x)  # CPX,ABS,3,4,CZidbvN
elif op == 0xC0: self.cmp(IMM, self.y)  # CPY,IMM,2,2,CZidbvN
elif op == 0xC4: self.cmp(ZPG, self.y)  # CPY,ZP,2,3,CZidbvN
elif op == 0xCC: self.cmp(ABS, self.y)  # CPY,ABS,3,4,CZidbvN
elif op == 0x90: next_pc += branch((self.f & 0x01) == 0) # BCC,REL,2,2/3,czidbvn
elif op == 0xB0: next_pc += branch((self.f & 0x01) != 0) # BCS,REL,2,2/3,czidbvn
elif op == 0xF0: next_pc += branch((self.f & 0x02) != 0) # BEQ,REL,2,2/3,czidbvn
elif op == 0x30: next_pc += branch((self.f & 0x80) != 0) # BMI,REL,2,2/3,czidbvn
elif op == 0xD0: next_pc += branch((self.f & 0x02) == 0) # BNE,REL,2,2/3,czidbvn
elif op == 0x10: next_pc += branch((self.f & 0x80) == 0) # BPL,REL,2,2/3,czidbvn
elif op == 0x50: next_pc += branch((self.f & 0x40) == 0) # BVC,REL,2,2/3,czidbvn
elif op == 0x70: next_pc += branch((self.f & 0x40) != 0) # BVS,REL,2,2/3,czidbvn

Once the branches are implemented, we can add unconditional jumps:

elif op == 0x4C: next_pc = self.addr(ABS)  # JMP,ABS,3,3,czidbvn
elif op == 0x6C: next_pc = self.addr(IND)  # JMP,IND,3,5,czidbvn

Now we can run WozMon up until the opcode 0xC8 (INY).

Increments

Some increment instructions are trivial and only increment registers (updating the N/Z flags):

elif op == 0xCA: self.x = self.nz(u8(self.x - 1))  # DEX,IMP,1,2,cZidbvN
elif op == 0x88: self.y = self.nz(u8(self.y - 1))  # DEY,IMP,1,2,cZidbvN
elif op == 0xE8: self.x = self.nz(u8(self.x + 1))  # INX,IMP,1,2,cZidbvN
elif op == 0xC8: self.y = self.nz(u8(self.y + 1))  # INY,IMP,1,2,cZidbvN

But there is a few more (INC and DEC). They operate on memory bytes instead. For them we write another helper function that reads a byte from memory, increments it, updates the flags, and writes it back. We also combine increment and decrement into one function by providing a “delta” argument. If delta == 1 it’s an increment, if delta == 0xff (-1) it’s a decrement:

def inc(self, mode, n):
    value = self.nz(u8(self.read(mode) + n))
    self.write(mode, value)

...
elif op == 0xE6: self.inc(ZPG, 1)  # INC,ZP,2,5,cZidbvN
elif op == 0xF6: self.inc(ZPX, 1)  # INC,ZPX,2,6,cZidbvN
elif op == 0xEE: self.inc(ABS, 1)  # INC,ABS,3,6,cZidbvN
elif op == 0xFE: self.inc(ABX, 1)  # INC,AX,3,7,cZidbvN
elif op == 0xC6: self.inc(ZPG, 0xFF)  # DEC,ZP,2,5,cZidbvN
elif op == 0xD6: self.inc(ZPX, 0xFF)  # DEC,ZPX,2,6,cZidbvN
elif op == 0xCE: self.inc(ABS, 0xFF)  # DEC,ABS,3,6,cZidbvN
elif op == 0xDE: self.inc(ABX, 0xFF)  # DEC,AX,3,7,cZidbvN

We’ve already covered 75 opcodes out of 151, more than a half. This lets us execute 15 first instructions from WozMon, until we stop at opcode 0x20, or JSR.

Stack

JSR instruction is rather simple: it pushes the program counter and jumps to the subroutine:

elif op == 0x20: self.push16(u16(next_pc - 1)); next_pc = self.addr(ABS) # JSR,ABS,3,6,czidbvn

We can also implement RTS (return from subroutine) and RTI (return from interrupt):

elif op == 0x60: next_pc = self.pop16() + 1  # RTS,IMP,1,6,czidbvn
elif op == 0x40: self.f = self.pop() | 0x20; next_pc = self.pop16() # RTI,IMP,1,6,czidbvn

Since we are dealing with stack - why don’t we add remaining push/pop instructions?

elif op == 0x48: self.push(self.a)                   # PHA,IMP,1,3,czidbvn
elif op == 0x68: n = self.pop(); self.a = self.nz(n) # PLA,IMP,1,4,cZidbvN
elif op == 0x08: self.push(self.f | 0x30)            # PHP,IMP,1,3,czidbvn
elif op == 0x28: self.f = self.pop() | 0x20          # PLP,IMP,1,4,CZIDBVN

Now if we execute first 18 instructions – we get the long-awaited Apple I prompt, a mighty backslash:

\

In fact, we can now step through instructions endlessly without a failure: WozMon is waiting for some user input in an infinite polling loop and never receives one.

Terminal input

We can now update the main loop to use termios and non-blocking keyboard polling:

import sys, termios, tty, select
a1 = AppleI()
cpu = MOS6502(a1)
attr = termios.tcgetattr(sys.stdin)
try:
    tc = termios.tcgetattr(sys.stdin)
    tc[3] = (tc[3] & ~termios.ICANON & ~termios.ECHO)
    termios.tcsetattr(sys.stdin, termios.TCSAFLUSH, tc)
    while True:
        cpu.step()
        if select.select([sys.stdin], [], [], 0.000001) == ([sys.stdin], [], []):
            c = sys.stdin.read(1)
            a1.send_key(13 if c == '\n' else ord(c))
finally:
    termios.tcsetattr(sys.stdin, termios.TCSADRAIN, attr)

If we try typing something when the monitor is running – it will be echoed on our terminal. Not unusual at a first glance, however this time echoing is done by our Apple I emulator and WozMon ROM!

But, as soon as we hit “Enter” – emulator crashes at instruction 0xAA (TAX).

Transfer

Nobody likes TAX-es, but here it’s an easy fix: those are just transfer instructions.

elif op == 0xAA: self.x = self.nz(self.a)  # TAX,IMP,1,2,cZidbvN
elif op == 0x8A: self.a = self.nz(self.x)  # TXA,IMP,1,2,cZidbvN
elif op == 0xA8: self.y = self.nz(self.a)  # TAY,IMP,1,2,cZidbvN
elif op == 0x98: self.a = self.nz(self.y)  # TYA,IMP,1,2,cZidbvN
elif op == 0xBA: self.x = self.nz(self.s)  # TSX,IMP,1,2,cZidbvN
elif op == 0x9A: self.s = self.x  # TXS,IMP,1,2,czidbvn

They swap the contents of registers A/X/Y with other registers.

The next one failing is 0x0A, or ASL. Time to implement bit shifting.

Bits and bytes

Shifting in 6502 happens by 1 bit either way, left or right. Some instructions also wrap bits around, rotating the leftmost bit to the right and vice versa. We can move this logic into a helper function:

def shift(self, mode, left, rot):
    b = self.read(mode)
    if left:
        r = u8((b << 1) | ((self.f & 1) * rot))
        f = (self.f & 0xFE) | (b >> 7)
    else:
        r = u8((b >> 1) | (((self.f & 1) << 7) * rot))
        f = (self.f & 0xFE) | (b & 1)
    self.f = f
    self.nz(r)
    self.write(mode, r)

We read a memory byte, we shift it. Carry bit may get rotated if rot == 1. Then new carry bit is set accordingly and N/Z bits are adjusted before the result is written back.

We can finish implementing all shifting/rotating opcodes:

elif op == 0x0A: self.shift(ACC, True, 0)  # ASL,ACC,1,2,CZidbvN
elif op == 0x06: self.shift(ZPG, True, 0)  # ASL,ZP,2,5,CZidbvN
elif op == 0x16: self.shift(ZPX, True, 0)  # ASL,ZPX,2,6,CZidbvN
elif op == 0x0E: self.shift(ABS, True, 0)  # ASL,ABS,3,6,CZidbvN
elif op == 0x1E: self.shift(ABX, True, 0)  # ASL,AX,3,6/7,CZidbvN
elif op == 0x4A: self.shift(ACC, False, 0)  # LSR,ACC,1,2,cZidbvN
elif op == 0x46: self.shift(ZPG, False, 0)  # LSR,ZP,2,5,cZidbvN
elif op == 0x56: self.shift(ZPX, False, 0)  # LSR,ZPX,2,6,cZidbvN
elif op == 0x4E: self.shift(ABS, False, 0)  # LSR,ABS,3,6,cZidbvN
elif op == 0x5E: self.shift(ABX, False, 0)  # LSR,AX,3,6/7,cZidbvN
elif op == 0x2A: self.shift(ACC, True, 1)  # ROL,ACC,1,2,CZidbvN
elif op == 0x26: self.shift(ZPG, True, 1)  # ROL,ZP,2,5,CZidbvN
elif op == 0x36: self.shift(ZPX, True, 1)  # ROL,ZPX,2,6,CZidbvN
elif op == 0x2E: self.shift(ABS, True, 1)  # ROL,ABS,3,6,CZidbvN
elif op == 0x3E: self.shift(ABX, True, 1)  # ROL,AX,3,6/7,CZidbvN
elif op == 0x6A: self.shift(ACC, False, 1)  # ROR,ACC,1,2,CZidbvN
elif op == 0x66: self.shift(ZPG, False, 1)  # ROR,ZP,2,5,CZidbvN
elif op == 0x76: self.shift(ZPX, False, 1)  # ROR,ZPX,2,6,CZidbvN
elif op == 0x7E: self.shift(ABX, False, 1)  # ROR,ABS,3,6,CZidbvN
elif op == 0x6E: self.shift(ABS, False, 1)  # ROR,AX,3,6/7,CZidbvN

We are now at 110/151 opcodes! The next failing instruction is 0x49, EOR.

Bitwise instructions can be implemented individually, or we may do a little trick and combine them into a single pipeline. Essentially, for any bitwise instruction we will be implementing all three of them: ((X & T) | U) ^ V). Here T, U and V can be set to certain values to only have one meaningful bitwise operation from the pipeline:

Here’s a helper function that calculates it all and updates N/Z flags:

def bit(self, and_val, or_val, xor_val):
    self.a = self.nz(u8(((self.a & and_val) | or_val) ^ xor_val))

And here are all bitwise opcodes:

elif op == 0x29: self.bit(self.read(IMM), 0, 0)  # AND,IMM,2,2,cZidbvN
elif op == 0x25: self.bit(self.read(ZPG), 0, 0)  # AND,ZP,2,3,cZidbvN
elif op == 0x35: self.bit(self.read(ZPX), 0, 0)  # AND,ZPX,2,4,cZidbvN
elif op == 0x2D: self.bit(self.read(ABS), 0, 0)  # AND,ABS,3,4,cZidbvN
elif op == 0x3D: self.bit(self.read(ABX), 0, 0)  # AND,AX,3,4,cZidbvN
elif op == 0x39: self.bit(self.read(ABY), 0, 0)  # AND,AY,3,4,cZidbvN
elif op == 0x21: self.bit(self.read(IZX), 0, 0)  # AND,ZIX,2,6,cZidbvN
elif op == 0x31: self.bit(self.read(IZY), 0, 0)  # AND,ZIY,2,5,cZidbvN
elif op == 0x09: self.bit(0xFF, self.read(IMM), 0)  # ORA,IMM,2,2,cZidbvN
elif op == 0x05: self.bit(0xFF, self.read(ZPG), 0)  # ORA,ZP,2,3,cZidbvN
elif op == 0x15: self.bit(0xFF, self.read(ZPX), 0)  # ORA,ZPX,2,4,cZidbvN
elif op == 0x0D: self.bit(0xFF, self.read(ABS), 0)  # ORA,ABS,3,4,cZidbvN
elif op == 0x1D: self.bit(0xFF, self.read(ABX), 0)  # ORA,AX,3,4,cZidbvN
elif op == 0x19: self.bit(0xFF, self.read(ABY), 0)  # ORA,AY,3,4,cZidbvN
elif op == 0x01: self.bit(0xFF, self.read(IZX), 0)  # ORA,ZIX,2,6,cZidbvN
elif op == 0x11: self.bit(0xFF, self.read(IZY), 0)  # ORA,ZIY,2,5,cZidbvN
elif op == 0x49: self.bit(0xFF, 0, self.read(IMM))  # EOR,IMM,2,2,cZidbvN
elif op == 0x45: self.bit(0xFF, 0, self.read(ZPG))  # EOR,ZP,2,3,cZidbvN
elif op == 0x55: self.bit(0xFF, 0, self.read(ZPX))  # EOR,ZPX,2,4,cZidbvN
elif op == 0x4D: self.bit(0xFF, 0, self.read(ABS))  # EOR,ABS,3,4,cZidbvN
elif op == 0x5D: self.bit(0xFF, 0, self.read(ABX))  # EOR,AX,3,4,cZidbvN
elif op == 0x59: self.bit(0xFF, 0, self.read(ABY))  # EOR,AY,3,4,cZidbvN
elif op == 0x41: self.bit(0xFF, 0, self.read(IZX))  # EOR,ZIX,2,6,cZidbvN
elif op == 0x51: self.bit(0xFF, 0, self.read(IZY))  # EOR,ZIY,2,5,cZidbvN

We are almost done. We only got 16 instructions left (addition/subtraction) and NOP. Well, NOP is opcode 0xEA and it does nothing. Arithmetics, on the other hand, are interesting.

Plus, minus

This is the last helper function we need to make our MOS6502 emulator work. However, it’s the one that operates in BCD and non-BCD modes differently.

In normal non-BCD mode, it simply adds the operand to accumulator (or subtracts, which is also addition, but with a negated value).

However, in BCD mode it treats both operands as 2-digit decimal numbers. So 0x43+0x28 becomes 0x71, while in non-BCD it would be 0x6B.

It’s a pretty long helper function but it does the trick:

def adc(self, mode, neg):
    n = self.read(mode)
    if self.f & 8 == 0:
        r = self.a + u16(n ^ (0xFF * neg)) + (self.f & 1)
        v = u16((0xFF * (1 - neg)) ^ (self.a ^ n)) & (self.a ^ r) & 0x80
        self.f = self.f & 0xBE | ((r & 0x100) >> 8) | u8(v >> 1)
        self.a = self.nz(u8(r))
    else:
        if neg:
            nd = ((n >> 4) * 10) + (n & 0x0F) + (1 - (self.f & 1))
            val = ((self.a >> 4) * 10) + (self.a & 0x0F) - nd
            self.f = self.f | 1
            if val < 0:
                self.f = self.f & 0xFE
                val = val + 100
            self.a = u8(self.nz((int(val / 10) << 4) + (val % 10)))
        else:
            p1 = (self.a & 0xF) + (n & 0xF) + (self.f & 1)
            p2 = int(p1 >= 10) + (self.a >> 4) + (n >> 4)
            self.f = self.f & 0xFE
            if p1 >= 10:
                p1 = p1 - 10
            if p2 >= 10:
                self.f = self.f | 1
                p2 = p2 - 10
            self.a = u8(self.nz((p2 << 4) + p1))

Now we can use it in our step() method:

elif op == 0x69: self.adc(IMM, 0)  # ADC,IMM,2,2,CZidbVN
elif op == 0x65: self.adc(ZPG, 0)  # ADC,ZP,2,3,CZidbVN
elif op == 0x75: self.adc(ZPX, 0)  # ADC,ZPX,2,4,CZidbVN
elif op == 0x6D: self.adc(ABS, 0)  # ADC,ABS,3,4,CZidbVN
elif op == 0x7D: self.adc(ABX, 0)  # ADC,AX,3,4,CZidbVN
elif op == 0x79: self.adc(ABY, 0)  # ADC,AY,3,4,CZidbVN
elif op == 0x61: self.adc(IZX, 0)  # ADC,ZIX,2,6,CZidbVN
elif op == 0x71: self.adc(IZY, 0)  # ADC,ZIY,2,5,CZidbVN
elif op == 0xE9: self.adc(IMM, 1)  # SBC,IMM,2,2,CZidbVN
elif op == 0xE5: self.adc(ZPG, 1)  # SBC,ZP,2,3,CZidbVN
elif op == 0xF5: self.adc(ZPX, 1)  # SBC,ZPX,2,4,CZidbVN
elif op == 0xED: self.adc(ABS, 1)  # SBC,ABS,3,4,CZidbVN
elif op == 0xFD: self.adc(ABX, 1)  # SBC,AX,3,4,CZidbVN
elif op == 0xF9: self.adc(ABY, 1)  # SBC,AY,3,4,CZidbVN
elif op == 0xE1: self.adc(IZX, 1)  # SBC,ZIX,2,6,CZidbVN
elif op == 0xF1: self.adc(IZY, 1)  # SBC,ZIY,2,5,CZidbVN

And, we are done! In fact, BCD mode is not required for WozMon to work correctly, but you will definitely need it to run Apple BASIC.

You can now run WozMon, enter some address – and it will print the memory byte there, proving that it works.

We can even type-in our first “Hello World” program for Apple I:

\
# Type this to show contents of last 256 bytes, i.e WozMon code:
FF00.FFFF 

FF00: D8 58 A0 7F 8C 12 D0 A9
FF08: A7 8D 11 D0 8D 13 D0 C9
FF10: DF F0 13 C9 9B F0 03 C8
...
FFE8: B0 C9 BA 90 02 69 06 2C
FFF0: 12 D0 30 FB 8D 12 D0 60
FFF8: 00 00 00 0F 00 FF 00 00

# Type this for a HELLO WORLD program:
280:A2 C BD 8B 2 20 EF FF CA D0 F7 60 8D C4 CC D2 CF D7 A0 CF CC CC C5 C8
280.297
280
R

HELLO WORLD

Curious what have just happened? WozMon accepts single addresses in hexadecimal form and prints the contents. Two addresses separated by a dot specify a memory range to display. Address followed by a colon store following bytes from that address onwards. Finally, “R” runs code from the last entered address.

The series of bytes now becomes clear if you look up all the opcodes in our step() method, or use a disassembler:

                            * = $0280
0280   A2 0C                LDX #$0C
0282   BD 8B 02   L0282     LDA $028B,X
0285   20 EF FF             JSR $FFEF
0288   CA                   DEX
0289   D0 F7                BNE L0282
028B   60                   RTS
028C   8D C4 CC D2 CF D7 A0 CF CC CC C5 C8
      "\r D  L  R  O  W     O  L  L  E  H"
                           .END

In other words:

Unfortunately, original WozMon calls user code using JMP and not JSR, so our program crashes at the end. You might want to fix it in either WozMon code, or in the app code (using JMP $FF1F instead of RTS at 0x28B).

BASIC

Can we emulate a full Apple I computer now? Yes! Our memory bus implementation lacks a few ROMs, but it’s not hard to add: we should load original “Integer BASIC” ROM and cassette interface (ACI) ROM into their addresses and we are good to go:

WOZACI = base64.b64decode('qaog7/+pjSDv/6D/yK0R0BD7rRDQmQACIO//yZvw4cmN0Omi/6kAhSSFJYUmhSfovQACydLwVsnX8DXJrvAnyY3wIMmg8OhJsMkKkAZpiMn6kK0KCgoKoAQKJiQmJYjQ+PDMTBr/pSSFJqUlhSewv6lAIMzBiKIAoSaiEAog28HQ+iDxwaAekOymKLCYILzBqRYgzMEgvMGgHyC/wbD5IL/BoDqiCEggvMFoKqA5ytD1gSYg8cGgNZDqsM0gv8GIrYHAxSnw+IUpwIBghiigQiDgwdD5af6w9aAeIODBoCyI0P2QBaAviND9vADAoCnKYKUmxSSlJ+Ul5ibQAuYnYA==')
INTBASIC = base64.b64decode('TLDirRHQEPutEN....') # 4KB of data here

class AppleI:
  def __init__(self):
    self.keys = []
    self.mem = bytearray(0x2000)
  def __getitem__(self, i):
    if i >= 0 and i <= 0x2000: return self.mem[i] # 8K RAM
    elif i >= 0xc100 and i <= 0xc1ff: return WOZACI[i - 0xc100]
    elif i >= 0xff00 and i <= 0xffff: return WOZMON[i - 0xff00]
    elif i >= 0xe000 and i <= 0xefff: return INTBASIC[i - 0xe000]
    elif i == 0xd010: return self.keys.pop(0)|0x80 if len(self.keys) else 0x80
    elif i == 0xd011: return 0x80 if len(self.keys) else 0
    else: return 0
  def __setitem__(self, i, n):
    if i >= 0 and i < 0x0fff: self.mem[i] = n
    elif i == 0xd012:
      if n & 0x7f == 0x0d: print('')
      elif n & 0x7f == 0x5f: print('\b', end = '', flush=True)
      else: print(chr(n&0x7f), end='', flush=True)
  def send_key(self, c):
    self.keys.append(c)

Starting it up and typing the famous “E000R” into monitor loads Integer BASIC! Here “E000” address is the beginning of BASIC ROM and “R” is a WozMon command to run code:

\
E000R
> PRINT "HELLO"
HELLO
> 10 PRINT "6502 IS COOL"
> 20 GOTO 10
> RUN

What’s next

You’ve got a working 6502 emulator. Full code is here: https://gist.github.com/zserge/e5a18c954ab18b5b07be7ff516b5387a and it’s under 300 LOC.

I’ve also implemented it in a similar manner in C and it can run plenty of Apple I programs, as well as emulate KIM-1 computer: https://github.com/zserge/bsoz

Feel free to report bugs, or simply use in your projects. Both emulators are not cycle-accurate, but it’s not so hard to add. Most instructions run at a fixed number of cycles per instruction, but some need to check for page boundaries to add an extra cycle.

Python code from this article works nicely with MicroPython/CircuitPython, so you may use it in embedded projects to emulate 6502 on some low-end hardware. You may even build an Apple I replica using Arduino, RP2040 or ESP32. I was able to run it on a pocket Cardputer earlier this year.

Well, I hope it was a useful journey into emulation and retrocomputers. Have fun!

I hope you’ve enjoyed this article. You can follow – and contribute to – on Github, Mastodon, Twitter or subscribe via rss.

Dec 28, 2024

See also: Tiny Great Languages: APL and more.