Skip to main content

Allocation Instrumenter for Java

In brief: We've open sourced a tool that allows you to provide a callback every time your program performs an allocation. The Java Allocation Instrumenter can be found here. Give it a whirl, if you are interested.

One thing that crops up a lot at my employer is the need to take an action on every allocation. This can happen in a lot of different contexts:
  1. The programmer has a task, and wants to know how much memory the task allocates, so wants to increment a counter on every allocation.
  2. The programmer wants to keep a histogram of most frequently accessed call sites.
  3. The programmer wants to prevent a task from allocating too much memory, so it keeps a counter on every allocation and throws an exception when the counter reaches a certain value.

Because of the demand for this, a few of us put together a tool that instruments your code and invokes a callback on every allocation. The Allocation Instrumenter is a Java agent written using the java.lang.instrument API and ASM. Each allocation in your Java program is instrumented; a user-defined callback is invoked on each allocation.

The easiest way to explain this is with an example. Assume you have a program that creates 10 strings, and you want to instrument it:

public class Test {
public static void main(String [] args) throws Exception {
for (int i = 0 ; i < 10; i++) {
new String("foo");
}
}
}
To do this, you create an instance of the interface Sampler:

import com.google.monitoring.runtime.instrumentation.AllocationRecorder;
import com.google.monitoring.runtime.instrumentation.Sampler;

public class Test {
public static void main(String [] args) throws Exception {
AllocationRecorder.addSampler(new Sampler() {
public void sampleAllocation(int count, String desc,
Object newObj, long size) {
System.out.println("I just allocated the object " + newObj +
" of type " + desc + " whose size is " + size);
if (count != -1) { System.out.println("It's an array of size " + count); }
}
});
for (int i = 0 ; i < 10; i++) {
new String("foo");
}
}
}

You can then compile and run the program:

% javac -classpath path/to/allocation.jar Test.java
% java -javaagent:path/to/allocation.jar Test

The output will look something like this:

I just allocated the object foo of type java/lang/String whose size is 24
I just allocated the object foo of type java/lang/String whose size is 24
I just allocated the object foo of type java/lang/String whose size is 24
I just allocated the object foo of type java/lang/String whose size is 24
I just allocated the object foo of type java/lang/String whose size is 24
I just allocated the object foo of type java/lang/String whose size is 24
I just allocated the object foo of type java/lang/String whose size is 24
I just allocated the object foo of type java/lang/String whose size is 24
I just allocated the object foo of type java/lang/String whose size is 24
I just allocated the object foo of type java/lang/String whose size is 24

So, by my standards, it is really pretty easy to use. If you find it useful, please let me know!

Edited to add I noticed this on Twitter: Cool, even if it uses Ant (so probably I will never try it). This is funny, because I only added an ant buildfile so more people would try it. You can download the source and compile it with javac in about one line.

Comments

Carsten said…
Can you explain the differences to a heap dump analyzer like Eclipse MAT? Where do you use this instrumenter in real life development?
Anonymous said…
Looks straightforward; thanks for open-sourcing the lib. Seems like this would be a welcome addition to BTrace; from what I know, BTrace doesn't let you do this. http://kenai.com/projects/btrace


Patrick
Jeremy Manson said…
@Carsten - Eclipse MAT is designed to analyze a heap dump. This is designed to track individual allocation sites.

The three examples at the top of the post are the three main use cases I have found for this code. It can be very useful, for example, to have a histogram of call sites where the bulk of your allocation takes place. (We tend to sample the allocation call sites rather than track every one, because getting a stack trace at every allocation is very heavyweight).
Jeremy Manson said…
@Patrick - It is pretty straightforward, which is why I OSS'd it. Lots of the other stuff we do internally involves writing loads of JVMTI and directly patching the JVM. I'd like to OSS that, too, but that would be a more ambitious effort.

DTrace actually has hooks that allow you to instrument allocations in a similar way, but the hooks have to be written in a DTrace-friendly way (i.e., not in Java). BTrace could hook into the DTrace hooks without too many problems. I recall a VM patch that was floating around a few months ago to enable it, but I don't know what became of it.
Enthusiast said…
AFAIK, you can do it in BTrace as well. Just check the NewComponent.java in BTrace samples.
Bruce Ritchie said…
I'm wondering about the accuracy of the AllocationRecorder.getObjectSize(Object obj, boolean isArray) method. From what I see the cached value is always the size of the object that was first cached, not the size of the parameter object.

For example if I test with your test class with different sizes of strings the reported object size is always 40 for all the strings.

Was the performance of getting the object size from the instrumentation instance so poor that it necessitated this cache?
Jeremy Manson said…
@Bruce - Yes, it was for performance. getObjectSize() was a dog. But all of your Strings probably are size 40 (that's kind of big, of course - 64-bit?). It's the backing char arrays that should be of variable size.

@Jaroslav - You're completely right! I had no idea. Since it looks as if you are a contributor - does it plug into the DTrace VM hooks, or does it do a similar bytecode rewrite?
Enthusiast said…
@Jeremy
By default BTrace uses bytecode instrumentation. But you can use it to hook into the DTrace machinery as well (see DTraceInline.java and DTraceDemoRef.java)
Suraj said…
This is pretty useful ! If I take a stack trace during each allocation how expensive can it get ? Any pointers to how Thread.getStackTrace() works would be great.

Thanks
Suraj
Jeremy Manson said…
@suraj - It depends on how you gather the traces. A couple of tips:

1) If you want to defer the cost of constructing a stack trace, constructing a new Throwable means that you can defer the expensive creation of the stack trace until you need it. (Try creating a bunch of Throwables and comparing that cost to the creation of a bunch of stack traces)

2) You might secretly not need all of the stack traces. One trick is to gather a statistically valid sample. We've found that a decent sample is to grab one every time a thread allocates ~512K of objects, where ~ means a statistically valid sampling distributed around the number 512K.

3) It depends on how much allocation you do, of course. Simply instrumenting the code can cost anywhere from 5-10% to 50%, depending on the application.
Robert said…
I was very excited to find this. I've been scouring the web looking for such a tool. I tried your sample app and everything builds properly in Eclipse but I'm not getting any callbacks on the allocations. What should I check to determine the problem?
Jeremy Manson said…
My suspicion is that there are weird setting in eclipse you need to tweak. Sadly, I can't help you, as I've only used it from the command line. :/
Robert said…
I got your sample app working. I have a couple more questions. I have a multi-threaded app and I'm wondering what the thread implications are. Is it thread safe? Do I set up a Sampler per thread and do I get callbacks that are thread specific? Or am I limited to a global Sampler which handles the callbacks of allocations for all threads?
Jeremy Manson said…
Global sampler. Of course, you can have your sampler delegate to a Sampler stored in a ThreadLocal if you like. The performance would probably be even more questionable than the sampler typically is, though.
Anonymous said…
I was looking for something like this for a while, I think it is very useful! Is there any way to exclude classes from the instrumentation process? (e.g. The allocations from the asm classes). I'm trying to get used to the code but I've never worked with ASM. Thank you.

Roberto
Jeremy Manson said…
@Anonymous - There isn't a way to exclude classes, as yet. Patches welcome. :)
Nosheen said…
Hello, thank you for this useful tool, your code was useful for my work.

I am not sure if this is a bug or I am misinterpreting the output of the sampler, I have this code:

public class HelloWorld{
public HelloWorld() {
String s11 = new String("constructor");
}
public static void main(String args[]){
Object o1 = new Object();
String s1 = new String("hello");
System.out.println("Example");
Object o2 = new Object();
}
}

and it seems to me that the object allocated in the constructor is not being detected. I expect to see something like "I just allocated the object constructor of type java/lang/String whose size is ..." in the output, but this is not the case.

What is going on?
Jeremy Manson said…
@Nosheen - did you add the sampler? It has to be done by hand:


public class HelloWorld{
  public HelloWorld() {
    String s11 = new String("constructor");
  }
  public static void main(String args[]){
    AllocationRecorder.addSampler(new Sampler() {
    public void sampleAllocation(int count, String desc,
      Object newObj, long size) {
      System.out.println("I just allocated the object " + newObj +
" of type " + desc + " whose size is " + size);
      if (count != -1) {         System.out.println("It's an array of size " + count); }
      }
    });
    
    Object o1 = new Object();
    String s1 =     new String("hello");
    System.out.println("Example");
    Object o2 = new Object();
  }
}
Nosheen said…
Oh I think I need to add more details. I added the sampler in the premain method in the AllocationInstrumenter. The code does detect a lot of object allocations, but not for the object created in the constructor.
Nosheen said…
Oh I see my stupid mistake now, I did not create an instance of HelloWorld to call the constructor in the first place :P
Anonymous said…
Thanks for sharing! But how to obtain the memory addresses of allocated variables?
Jeremy Manson said…
@Anonymous - you can't! This is Java, you aren't supposed to have memory addresses. Also, memory addresses can change because of garbage collection behavior. If you need field offset, you can use the sun.misc.Unsafe class. If you need direct access to a memory region, you can use the JNI critical functions, but be careful, because a) they are unsafe, and b) they acquire very heavyweight locks.
Anonymous said…
This is a very useful tool. I have such question: I like to use this tool for a Java benchmark, but I don't want to modify the benchmark source code (like, adding the sampler in the benchmark source code directly). Where should I add the sampler? Any help is much appreciated :)
Jeremy Manson said…
It would be pretty easy to write a little wrapper program that invokes main() in the original program:

public class Wrapper {
  public static void main(String[] args) {
    AllocationRecorder.addSampler(new Sampler() {
      public void sampleAllocation(int count, String desc, Object newObj, long size) {
        // whatever
      }
    });
    OriginalProgram.main(args);
  }
}

We could add a command line loading facility, but this is easier for me :)
Anonymous said…
Thank you for your prompt reply! Can I ask how to add a command line loading facility? Like, putting addSampler in jar file and load it using javaagent when I want to profile a Java benchmark? Thanks!
Jeremy Manson said…
You would add it to the command line parsing functionality in the premain() function. Maybe pass the name of a Sampler class, and then have it search the classpath for that class and add it.
Anonymous said…
Thanks a lot!

Popular posts from this blog

Double Checked Locking

I still get a lot of questions about whether double-checked locking works in Java, and I should probably post something to clear it up. And I'll plug Josh Bloch's new book, too. Double Checked Locking is this idiom: // Broken -- Do Not Use! class Foo {   private Helper helper = null;   public Helper getHelper() {     if (helper == null) {       synchronized(this) {         if (helper == null) {           helper = new Helper();         }       }     }   return helper; } The point of this code is to avoid synchronization when the object has already been constructed. This code doesn't work in Java. The basic principle is that compiler transformations (this includes the JIT, which is the optimizer that the JVM uses...

What Volatile Means in Java

Today, I'm going to talk about what volatile means in Java. I've sort-of covered this in other posts, such as my posting on the ++ operator , my post on double-checked locking and the like, but I've never really addressed it directly. First, you have to understand a little something about the Java memory model. I've struggled a bit over the years to explain it briefly and well. As of today, the best way I can think of to describe it is if you imagine it this way: Each thread in Java takes place in a separate memory space (this is clearly untrue, so bear with me on this one). You need to use special mechanisms to guarantee that communication happens between these threads, as you would on a message passing system. Memory writes that happen in one thread can "leak through" and be seen by another thread, but this is by no means guaranteed. Without explicit communication, you can't guarantee which writes get seen by other threads, or even the order in whic...

Atomicity, Visibility and Ordering

(Note: I've cribbed this from my doctoral dissertation. I tried to edit it heavily to ease up on the mangled academic syntax required by thesis committees, but I may have missed some / badly edited in places. Let me know if there is something confusingly written or just plain confusing, and I'll try to untangle it.) There are these three concepts, you see. And they are fundamental to correct concurrent programming. When a concurrent program is not correctly written, the errors tend to fall into one of the three categories: atomicity , visibility , or ordering . Atomicity deals with which actions and sets of actions have indivisible effects. This is the aspect of concurrency most familiar to programmers: it is usually thought of in terms of mutual exclusion. Visibility determines when the effects of one thread can be seen by another. Ordering determines when actions in one thread can be seen to occur out of order with respect to another. Let's talk about t...