Java agent - Bond or Smith?

The previous post about JVM turned out to be well-received, so I would like to uncover another fun topic related to JVM - Java agents. Although not everyone gets a chance to write custom Java agents in his career, they are still a wonderful field to explore and open an endless way to customize JVM behavior.

Java agents are special in the way that JVM allows them to do much more than a regular Java app is allowed to: they can instrument or profile code, hot-swap classes, patch methods, etc. This article will focus on pure Java agents, but it’s worth mentioning that there are also JNI agents that are completely different and can do even more.

Our first agent

Each agent has two possible entry points - premain and agentmain methods. They both have the same signature and take an argument string and an instrument object. The only difference between them is that premain is called when the agent is loaded before JVM starts and agentmain is called when the agent is attached to the running JVM later. We will be using premain:

package com.github.zserge.toyjavaagent;

import java.lang.instrument.Instrumentation;

public class ToyAgent {
  public static void premain(String args, Instrumentation inst) {
    System.out.println("Agent says hello.");
  }
}

Agents are bundled and loaded as JAR files with a special manifest.mf, pointing to a class which is an entry point. We will be using gradle to build our agent, and the build.gradle would look like this:

plugins {
  id 'java'
}

repositories {
  mavenCentral()
}

dependencies {
  compile group: 'org.ow2.asm', name: 'asm', version: '8.0.1'
  testImplementation 'org.junit.jupiter:junit-jupiter-api:5.3.1'
  testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine:5.3.1'
}

jar {
  from { configurations.compile.collect { it.isDirectory() ? it : zipTree(it) } }
  manifest {
    attributes 'Implementation-Title': 'toy-agent',
               'Implementation-version': '1.0',
               'Premain-Class': 'com.github.zserge.toyjavaagent.ToyAgent',
               'Can-Retransform-Classes': 'true'
  }
}

test {
  useJUnitPlatform()
  jvmArgs '-javaagent:' + jar.archivePath
}

As you can see, java agents are not so different from regular java libraries. I have added ASM dependency here because we will be messing up with Java bytecode later. JUnit5 is used for testing and the goal of this article is to see what our agent can do to our test classes. That is why in the test section we prepend -javaagent command line option so that our test code runner would load our agent.

The jar section requires the manifest to be created and it should tell the JVM where our main agent class is located (ToyAgent) and what our agent is allowed to do (for example, retransform or redefine classes).

In the src/test/java let’s create a simple Example.java class that will be our guinea pig:

import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.assertEquals;

public class Example {
  public String getAgentName() {
    return "Bond";
  }
  public int meaningOfLife() {
    return 42;
  }
  @Test
  public void testAgent() {
    assertEquals(42, meaningOfLife());
    assertEquals("Smith", getAgentName());
  }
}

If all goes well - you should see “Agent says hello.” output before the test fails because obviously, Smith is not Bond.

Messing up with JVM

Not impressed? Let’s fix the test by making getAgentName() return “Smith” without modifying test sources. It should be possible with Instrumentation API to a certain extent, but loading and patching class bytecode manually would be too obscure, so let’s use ASM library instead. ASM is a very powerful tool to analyze and modify java classes in runtime, it’s used by Gradle, Groovy, Kotlin, and even OpenJDK itself.

Let’s add a class transformer that prints all loaded classes:

public static void premain(String args, Instrumentation inst) throws Exception {
  inst.addTransformer(new ClassFileTransformer() {
      @Override
      public byte[] transform(ClassLoader loader,
          String name, Class<?> c, ProtectionDomain pd,
          byte[] bytecode) throws IllegalClassFormatException {
        System.out.println("class " + name);
        return bytecode;
      }
    });
}

If we run the test once again - we will see a big list of class names, and there will be “Example” among them. This is our test class and we will have to dive deeper, for example to visit and print all its methods:

public static void premain(String args, Instrumentation inst) throws Exception {
  inst.addTransformer(new ClassFileTransformer() {
    @Override
    public byte[] transform(ClassLoader loader,
        String name, Class<?> c, ProtectionDomain pd,
        byte[] bytecode) throws IllegalClassFormatException {
      if (!name.equals("Example")) {
        // Return original bytecode for other classes
        return bytecode;
      }
      // Read Example class bytecode
      ClassReader cr = new ClassReader(bytecode);
      ClassWriter cw = new ClassWriter(ClassWriter.COMPUTE_FRAMES);
      cr.accept(new ClassVisitor(Opcodes.ASM5, cw) {
        @Override
        public MethodVisitor visitMethod(int access,
            String methodName, String desc,
            String signature, String[] exceptions) {
          System.out.println("visit: " + name + " " + methodName);
          return super.visitMethod(access, methodName, desc,
              signature, exceptions);
        }
      }
    }
  });
}

Running this would print out four methods of the Example class: <init>, getAgentName, meaningOfLife, and testAgent.

Patching bytecode

If we want to modify the method - we would have to create a custom method visitor. In the method visitor, we go to the code section and insert the following instructions: LDC and ARETURN. LDC is used to push a string literal on top of the stack, and ARETURN is used to return the object from the top of the stack. Together they will change the getAgentName method to return “Smith” instead of “Bond”:

MethodVisitor v = cv.visitMethod(access, methodName, desc, signature, exceptions);
v.visitCode();
v.visitLdcInsn("Smith");
v.visitInsn(Opcodes.ARETURN);
v.visitEnd();
return v;

Now, if we run the tests - we will see them passing, although in the source code the method still returns “Bond”. So, yes, agents can be evil and should be used with care.

Of course, we can go even further and do something useful, like tracing every single method as it is called. To achieve this we will need to insert something like:

System.out.println("Agent intercepted: " + methodName);

Or, in the terms of JVM instructions get “out” field, load constant string on stack and call “println” method:

GETSTATIC     java/lang/System, out, Ljava/io/PrintStream;
LDC           "Agent intercepted: ..."
INVOKEVIRTUAL java/io/PrintStream, "println", "(Ljava/lang/String;)V"

With ASM library our method visitor would be modified like this:

v.visitFieldInsn(
    Opcodes.GETSTATIC, "java/lang/System", "out", "Ljava/io/PrintStream;");
v.visitLdcInsn("Agent intercepted:" + method);
v.visitMethodInsn(Opcodes.INVOKEVIRTUAL, "java/io/PrintStream", "println",
    "(Ljava/lang/String;)V", false);

Running the test now shows that execution starts with <init> which is an instance initialization (a constructor), the testAgent() is called, which calls meaningOfLife() and getAgentName().

The full code of our toy agent can be found on github.

It can be further extended to measure the time spent on each method and get a basic profiler. Or to modify methods to restrict certain behavior, i.e. don’t let executing certain commands via Runtime.execute. Another popular use case for java agents is to hide JVM code for proprietary tooling - Instana and NewRelic use java agents to monitor the code inside the JVM. Instana SDK comes only with a stub implementation of all the public classes, and the actual bytecode is loaded only in runtime, which makes it harder to debug, but on the other hand, makes reverse engineering significantly harder.

So, agents can be good and agents can be evil. But knowing and understanding how they work is never a bad thing.

I hope you’ve enjoyed this article. You can follow – and contribute to – on Github, Mastodon, Twitter or subscribe via rss.

Jun 06, 2020

See also: How to write a (toy) JVM and more.