How to write a (toy) JVM

Whether we like it or not, but Java is one of the most widely used programming languages. However, since most of the applications in Java are either too boring or too complex - not every Java developer has enough curiosity to look under the hood and see how JVM works.

In this post I will try to write a toy (and incomplete) JVM to show the core principles behind it and hopefully sparkle some interest in you to learn it further.

Our humble goal

Let’s start with something really simple:

public class Add {
  public static int add(int a, int b) {
    return a + b;
  }
}

We compile our class with javac Add.java and it results in Add.class. This class file is the actual binary file that JVM can execute. All that is left to do is to implement such a JVM that would execute it correctly.

If we look inside the Add.class with a hexdump - we probably won’t get too impressed:

00000000  ca fe ba be 00 00 00 34  00 0f 0a 00 03 00 0c 07  |.......4........|
00000010  00 0d 07 00 0e 01 00 06  3c 69 6e 69 74 3e 01 00  |........<init>..|
00000020  03 28 29 56 01 00 04 43  6f 64 65 01 00 0f 4c 69  |.()V...Code...Li|
00000030  6e 65 4e 75 6d 62 65 72  54 61 62 6c 65 01 00 03  |neNumberTable...|
00000040  61 64 64 01 00 05 28 49  49 29 49 01 00 0a 53 6f  |add...(II)I...So|
00000050  75 72 63 65 46 69 6c 65  01 00 08 41 64 64 2e 6a  |urceFile...Add.j|
00000060  61 76 61 0c 00 04 00 05  01 00 03 41 64 64 01 00  |ava........Add..|
00000070  10 6a 61 76 61 2f 6c 61  6e 67 2f 4f 62 6a 65 63  |.java/lang/Objec|
00000080  74 00 21 00 02 00 03 00  00 00 00 00 02 00 01 00  |t.!.............|
00000090  04 00 05 00 01 00 06 00  00 00 1d 00 01 00 01 00  |................|
000000a0  00 00 05 2a b7 00 01 b1  00 00 00 01 00 07 00 00  |...*............|
000000b0  00 06 00 01 00 00 00 01  00 09 00 08 00 09 00 01  |................|
000000c0  00 06 00 00 00 1c 00 02  00 02 00 00 00 04 1a 1b  |................|
000000d0  60 ac 00 00 00 01 00 07  00 00 00 06 00 01 00 00  |`...............|
000000e0  00 03 00 01 00 0a 00 00  00 02 00 0b              |............|

Although we don’t see a clear structure here yet - we need to find a way how to parse it: what are these ()V and (II)I, what is <init>, and why does it start with “cafe babe”?

You probably have seen another way to dump class files, which is often more useful:

$ javap -c Add
Compiled from "Add.java"
public class Add {
  public Add();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  public static int add(int, int);
    Code:
       0: iload_0
       1: iload_1
       2: iadd
       3: ireturn
}

Now we see our class, its constructor, and a method. Both constructor and method contain a few instructions and it now becomes more or less clear what our add() method does: it loads two arguments (iload_0 and iload_1), adds them, and returns the result. JVM is a stack machine, so there are no registers, all arguments to the instructions are stored on the internal stack and the results are pushed on the stack as well.

Class loader

Now, how can we achieve what javap did here, how do we parse the class file?

If we look into the JVM specification, we learn about classfile structure. It always starts with 4 bytes signature (CAFEBABE), then 2+2 bytes for version, sounds simple.

Since we would have to read bytes, shorts, ints, and byte sequences from the binary file, we can start implementing our loader like this:

type loader struct {
	r   io.Reader
	err error
}

func (l *loader) bytes(n int) []byte {
	b := make([]byte, n, n)
	// we don't want to handle errors in each step,
	// so simply store the first found error till the end
	// and do nothing if we entered an erroneous state
	if l.err == nil {
		_, l.err = io.ReadFull(l.r, b)
	}
	return b
}
func (l *loader) u1() uint8  { return l.bytes(1)[0] }
func (l *loader) u2() uint16 { return binary.BigEndian.Uint16(l.bytes(2)) }
func (l *loader) u4() uint32 { return binary.BigEndian.Uint32(l.bytes(4)) }
func (l *loader) u8() uint64 { return binary.BigEndian.Uint64(l.bytes(8)) }

// Usage:
f, _ := os.Open("Add.class")
loader := &loader{r: f}
cafebabe := loader.u4()
major := loader.u2()
minor := loader.u2()

And then the spec tells us that we need to parse the constant pool. What is it? It is a special part of the class file, that contains constants needed to run the class. All the strings, numerical constants, and references are stored there and each has a unique uint16 index (thus, a class may have up to 64K constants).

There are several types of constants in the pool, each containing a different set of values. We would be interested in:

As you see, constants in the pool are referring to each other a lot. Since we are implementing a JVM in Go and there are no union types, let’s make a Const type that would contain various possible constant fields in it:

type Const struct {
	Tag              byte
	NameIndex        uint16
	ClassIndex       uint16
	NameAndTypeIndex uint16
	StringIndex      uint16
	DescIndex        uint16
	String           string
}
type ConstPool []Const

Then, following the JVM spec we could parse the constant pool data like this:

func (l *loader) cpinfo() (constPool ConstPool) {
	constPoolCount := l.u2()
	// Valid constant pool indices start from 1
	for i := uint16(1); i < constPoolCount; i++ {
		c := Const{Tag: l.u1()}
		switch c.Tag {
		case 0x01: // UTF8 string literal, 2 bytes length + data
			c.String = string(l.bytes(int(l.u2())))
		case 0x07: // Class index
			c.NameIndex = l.u2()
		case 0x08: // String reference index
			c.StringIndex = l.u2()
		case 0x09, 0x0a: // Field and method: class index + NaT index
			c.ClassIndex = l.u2()
			c.NameAndTypeIndex = l.u2()
		case 0x0c: // Name-and-type
			c.NameIndex, c.DescIndex = l.u2(), l.u2()
		default:
			l.err = fmt.Errorf("unsupported tag: %d", c.Tag)
		}
		constPool = append(constPool, c)
	}
	return constPool
}

We keep things simple here, but in real JVM we would have to treat long and double constant types uniquely, by inserting an additional unused const item, as JVM spec tells us (since const items are considered to be 32-bit).

To easier get string literals by indices, we would implement a Resolve(index uint16) string method:

func (cp ConstPool) Resolve(index uint16) string {
	if cp[index-1].Tag == 0x01 {
		return cp[index-1].String
	}
	return ""
}

Now we have to add similar helpers to parse a list of class interfaces, fields and methods, and their attributes:

func (l *loader) interfaces(cp ConstPool) (interfaces []string) {
	interfaceCount := l.u2()
	for i := uint16(0); i < interfaceCount; i++ {
		interfaces = append(interfaces, cp.Resolve(l.u2()))
	}
	return interfaces
}

// Field type is used for both, fields and methods
type Field struct {
	Flags      uint16
	Name       string
	Descriptor string 
	Attributes []Attribute 
}

// Attributes contain addition information about fields and classes
// The most useful is "Code" attribute, which contains actual byte code
type Attribute struct {
	Name string
	Data []byte
}

func (l *loader) fields(cp ConstPool) (fields []Field) {
	fieldsCount := l.u2()
	for i := uint16(0); i < fieldsCount; i++ {
		fields = append(fields, Field{
			Flags:      l.u2(),
			Name:       cp.Resolve(l.u2()),
			Descriptor: cp.Resolve(l.u2()),
			Attributes: l.attrs(cp),
		})
	}
	return fields
}

func (l *loader) attrs(cp ConstPool) (attrs []Attribute) {
	attributesCount := l.u2()
	for i := uint16(0); i < attributesCount; i++ {
		attrs = append(attrs, Attribute{
			Name: cp.Resolve(l.u2()),
			Data: l.bytes(int(l.u4())),
		})
	}
	return attrs
}

Both, fields and methods are represented as Fields, which is very fortunate and saves us some time. Finally, we can assemble it all together and parse our complete class:

type Class struct {
	ConstPool  ConstPool
	Name       string
	Super      string
	Flags      uint16
	Interfaces []string
	Fields     []Field
	Methods    []Field
	Attributes []Attribute
}

func Load(r io.Reader) (Class, error) {
	loader := &loader{r: r}
	c := Class{}
	loader.u8()           // magic (u32), minor (u16), major (u16)
	cp := loader.cpinfo() // const pool info
	c.ConstPool = cp
	c.Flags = loader.u2()             // access flags
	c.Name = cp.Resolve(loader.u2())  // this class
	c.Super = cp.Resolve(loader.u2()) // super class
	c.Interfaces = loader.interfaces(cp)
	c.Fields = loader.fields(cp)    // fields
	c.Methods = loader.fields(cp)   // methods
	c.Attributes = loader.attrs(cp) // methods
	return c, loader.err
}

Now if we look into the resulting class info we will see that it has zero fields and two methods - <init>:()V and add:(II)I. What are these things that look like roman numbers with parens? Those are descriptors, they define what types of arguments a method takes and what it returns. In this case <init> (a synthetic method, used to initialize objects when they are constructed) takes no arguments and returns nothing (V=void), while the “add” method takes two ints (I=int32) and returns an integer.

Bytecode

If we look closer, we’ll see that each method in our parsed class has one attribute, named “Code”. This attribute has a slice of bytes as a payload. The bytes are the following:

<init>:
[0 1 0 1 0 0 0 5 42 183 0 1 177 0 0 0 1 0 7 0 0 0 6 0 1 0 0 0 1]
add:
[0 2 0 2 0 0 0 4 26 27 96 172 0 0 0 1 0 7 0 0 0 6 0 1 0 0 0 3]

If we look at the spec, this time in the bytecode section, we will see that “Code” attribute starts with maxstack value (2 bytes), then maxlocals (2 bytes), then code length (4 bytes), and then actual code. So our attributes can be read like this:

<init>: maxstack: 1, maxlocals: 1, code: [42 183 0 1 177]
add: maxstack: 2, maxlocals: 2, code: [26 27 96 172]

Yes, we only have 4 and 5 bytes of code in each method. What do those bytes mean?

Like I said, JVM is a stack machine. Each instruction is encoded as a single byte, which might be followed by some additional arguments. If we look at the spec, we will see that the “add” method has the following instructions:

 26 = iload_0
 27 = iload_1
 96 = iadd
172 = ireturn

Exactly like we saw in javap output at the beginning! But how shall we execute it?

JVM frames

When a method is executed inside the JVM, it has its own stack for temporary operands, its own local variables, and its own chunk of code to execute. All these parameters are stored in a single execution frame. Additionally, frames contain the current instruction pointer (how far we have advanced while executing the bytecode) and a pointer to the class, which contained the method. The latter is needed to get access to the const pool of the class, as well as other details.

Let’s make a method that constructs a frame for the given method to be called with the given arguments. I will be using interface{} type here as the Value type, although proper union types would of course be a safer choice.

type Frame struct {
	Class  Class
	IP     uint32
	Code   []byte
	Locals []interface{}
	Stack  []interface{}
}

func (c Class) Frame(method string, args ...interface{}) Frame {
	for _, m := range c.Methods {
		if m.Name == method {
			for _, a := range m.Attributes {
				if a.Name == "Code" && len(a.Data) > 8 {
					maxLocals := binary.BigEndian.Uint16(a.Data[2:4])
					frame := Frame{
						Class:  c,
						Code:   a.Data[8:],
						Locals: make([]interface{}, maxLocals, maxLocals),
					}
					for i := 0; i < len(args); i++ {
						frame.Locals[i] = args[i]
					}
					return frame
				}
			}
		}
	}
	panic("method not found")
}

So, we got the Frame, with initialized locals, empty stack, and preloaded bytecode. Now it’s time to execute the bytecode:

func Exec(f Frame) interface{} {
	for {
		op := f.Code[f.IP]
		log.Printf("OP:%02x STACK:%v", op, f.Stack)
		n := len(f.Stack)
		switch op {
		case 26: // iload_0
			f.Stack = append(f.Stack, f.Locals[0])
		case 27: // iload_1
			f.Stack = append(f.Stack, f.Locals[1])
		case 96:
			a := f.Stack[n-1].(int32)
			b := f.Stack[n-2].(int32)
			f.Stack[n-2] = a + b
			f.Stack = f.Stack[:n-1]
		case 172: // ireturn
			v := f.Stack[n-1]
			f.Stack = f.Stack[:n-1]
			return v
		}
		f.IP++
	}
}

Finally, we can put it all together and run by calling our add() method:

f, _ := os.Open("Add.class")
class, _ := Load(f)
frame := class.Frame("add", int32(2), int32(3))
result := Exec(frame)
log.Println(result)

// OUTPUT:
OP:1a STACK:[]
OP:1b STACK:[2]
OP:60 STACK:[2 3]
OP:ac STACK:[5]
5

So, it works. Yes, it’s a very lousy and pitiful JVM, but still, it does what JVM does - loads bytecode and interprets it (but of course, the real JVM does way more than that).

what’s missing

The other two hundred instructions, the runtime, OOP type system, and a few other things.

There are 11 groups of instructions and most of them are trivial:

Most instructions are trivial to implement - they take one or two arguments from the stack, perform some operation on them, and push the result. The only thing to keep in mind here is that long and double instructions expect that each value takes two slots on the stack, so you may require additional push() and pop(), which makes it harder to group the instructions.

Implementing References requires to think about the object model - how you would like to store Objects and their Classes, how to represent inheritance, where to store instance fields and class fields. Also, this is where you would have to be careful about method dispatching - there are multiple “invoke” instructions and they behave in a slighly different manner:

If you implement a JVM in a language without garbage collection - this is also where you should think about how to perform the garbage collection: reference counting, mark-and-sweep, etc. Handling exceptions by implementing athrow, propagating them through the frames and handling them with exception tables is another interesting topic.

Finally, your JVM remains useless if there no runtime classes. Without java/lang/Object you are unlikely to even see how new instruction works by constructing new objects. Your runtime may provide some common JRE classes from java.lang, java.io, and java.util packages, or it may be something more domain-specific. Most likely some methods in the classes would have to be implemented natively and not in Java. This will raise the question of how to find and execute such methods and it becomes another edge case for your JVM.

In other words, implementing a proper JVM is not so trivial, however, understanding how it is implemented is not so complex either.

I only had one summer weekend to spare, and my JVM still has a long way to go, but the structure looks more or less clear: https://github.com/zserge/tojvm (PRs are always welcome!)

The actual code snippets from this blog post are even smaller and available as a gist.

If you would like to explore the topic deeper - you may consider looking at small JVMs:

I hope this article did not turn you away from Java. Virtual machines are fun, and JVM truly deserves its place.

I hope you’ve enjoyed this article. You can follow – and contribute to – on Github, Mastodon, Twitter or subscribe via rss.

Jun 01, 2020

See also: kotlin - a new hope and more.