Monday, May 14, 2007

Profiling with JVMTI/JVMPI, SIGPROF and AsyncGetCallTrace

This post is not about the memory model, but, instead, is about being able to get, asynchronously, the stack trace of a currently executing thread out of Sun's Java VM. Getting one synchronously is easy, of course.

I was trying to profile some Java applications under Linux, and I wanted to use SIGPROF and JVMTI to find out what code was running at any given second. This is just a posting to document how I did it, as I ended up using some undocumented JVM features, and I can't find a proper reference for them anywhere else.

More information as to why you might want to do this is here.

JVMTI is the Java Virtual Machine Toolkit Interface. It's a C/C++ interface to the virtual machine that allows you to hook in debuggers and profilers. More information can be found here.

SIGPROF is a POSIX signal. The way it is set up under Linux is that you can set a timer that goes off at intervals, and, whenever the timer expires, the SIGPROF signal is sent to the application. The idea is that you can then collect profiling information at fixed intervals.

The signal is caught and handled by a random running thread. That thread will be doing some task related to running the Java application -- whether that is executing user code, running garbage collection, or doing internal VM maintenance.

The problem is that there is no officially documented JVMTI hook that allows the user to find out exactly what the currently running thread is doing that is safe to run in a signal handler. The official way of getting a stack trace for the currently executing thread, the GetStackTrace function, isn't safe to be called in an asynchronous way -- if you try to read the stack when a timer expires, the Java stack could be in a corrupt state, or GC could be running, or any number of other things.

It turns out that the kind folks who wrote the Java virtual machine were fully aware of this, and provided an undocumented interface for this type of profiling, used by their Forte Analyzer (which now operates under the Sun Studio umbrella, I believe). Now that they've open-sourced the JVM, this is public knowledge. For those of you who like to see the source code for such things, it can be found in hotspot/src/share/prims/forte.cpp.

In principle, AsyncGetCallTrace is fairly easy to use. This is less true in practice. Since JVMPI has been removed in JDK6, you start by having to include JVMPI structures in your headers. In JDK5 and earlier, this step in unnecessary (all covered under the GPL):

typedef struct {
jint lineno;
jmethodID method_id;
} JVMPI_CallFrame;

typedef struct {
JNIEnv *env_id;
jint num_frames;
JVMPI_CallFrame *frames;
} JVMPI_CallTrace;

Now that you have the JVMPI structures defined, you need a prototype for the undocumented call:

extern "C"
void AsyncGetCallTrace(JVMPI_CallTrace *trace, jint depth,
void* ucontext)
__attribute__ ((weak));

The __attribute__ ((weak)) is only for g++ users -- it tells the compiler not to look for AsyncGetCallTrace at compile or link time. People using other compilers can create a library stub that contains this method -- this is left as an exercise for the reader.

Since this post is too long already, I will assume that you know how to write both JVMTI and signal handlers. If you want me to write about that, leave a response, and I'll get to it at some point.

So, you write a bit of JVMTI that executes this method, and find out that although the method is invoked, it always returns -1 for the stack depth. It turns out that jmethodIDs (which are the internal numerical representation for a method in JVMTI / JVMPI) are allocated lazily. Most JVMTI methods call a method to GetAndAllocate these method IDs. AsyncGetCallTrace calls a getter, and relies on the jmethodIDs being pre-allocated. The expectation in the code is that if you have a ClassLoad hook in JVMTI, then the method ids will get allocated correctly.

So, you insert a ClassLoad Hook (all error correction code removed -- include your error correction code!):

void JNICALL OnClassLoad(jvmtiEnv *jvmti_env,
JNIEnv* jni_env,
jthread thread,
jclass klass) {


jvmtiEnv *jvmti;
vm->GetEnv((void **)&jvmti, JVMTI_VERSION);
jvmtiEventCallbacks *callbacks =
new jvmtiEventCallbacks();
callbacks->ClassLoad = &OnClassLoad;

I ran it after I did this, and found out that the stack depth was no longer -1, but the method ids were invalid. This seems to violate the contract of AsyncGetCallTrace; maybe one of my audience can tell me that this is wrong. Anyway, in order to "prime the pump" of method ids, I inserted another hook -- this time, to load up the method ids:

void JNICALL OnClassPrepare(jvmtiEnv *jvmti_env,
JNIEnv* jni_env,
jthread thread,
jclass klass) {
// We need to do this to "prime the pump",
// as it were -- make sure that all of the
// methodIDs have been initialized internally,
// for AsyncGetCallTrace. I imagine it slows
// down class loading a mite, but honestly,
// how fast does class loading have to be?
jint method_count;
jmethodID *methods;
jvmti_env->GetClassMethods(klass, &method_count,
delete [] methods;

This is pretty dumb, and I'd love to get rid of it. Anyone more familiar with this interface should slap me upside the head and tell me how to do it properly.

Anyway, after you do this, AsyncGetCallTrace returns valid results. If the resulting stack depth is negative, it wasn't able to grab a valid Java stack. The most important negative result is -2, which indicates that you sampled in the middle of garbage collection.


Ranjit Iyer said...

Thanks for a very informative post. Am trying to express my vague understanding of the series of events that happen when the timer goes off. I'd appreciate if you could shed some light where necessary.

The timer goes off and a signal is raised by the OS. From here on am confused between two things.

Does this signal gets translated into a Java exception accusing the currently executing thread's method for having raised it? and which in turn causes a JVMTI ExceptionEvent to occur, giving your agent the chance to obtain the stack trace of the faulting thread (which was the active thread)?


If the above conjecture is false, would you elaborate on the role played by the signal handler and the events thereafter?

Specifically, if you installed a signal handler to handle the SIGPROF, on which thread is the handler executed (current thread?)? If it is on the current/active thread, how do you force an Java event so the JVMTI agent gets notified and gets a chance to do something.

Thanks and Regards,

Jeremy Manson said...

Hey Ranjit --

I answered this in waaay too much detail in a post about using SIGPROF. The short version is that you load a signal handler in C++, and it gets executed when the timer goes off. Nothing happens in Java code at all.

Ranjit Iyer said...

Thanks Jeremy, That was very helpful.


Brent said...

Jeremy: I just stumbled across this 2-year old post of yours.

What is the current state of the art? In particular, what is the best (accurate, yet low impact) Java profiler, in your experience?

I have used JIP
in the past, but it has terrible performance impact.

Jeremy Manson said...

Heh. We've been working on ours. It now does native and mixed mode (i.e., Java + native) stack frames, which means it can identify hotspots in the VM and in JNI code (including GC and compiler time), with little / no cost. It really is pretty sweet. So I use that one a lot. I would love to make it open source, but I haven't even had the time to make the profiler work on non-Google programs, much less open-source it.

When I don't use that, I tend to use JProfiler, largely because it is available and handy at Google. It has a fairly enormous performance impact, though.

I don't know enough about it to recommend it, but I understand that the Sun Studio profiler uses AsyncGetCallTrace. As a result, it should have low overhead. I also believe that the binaries are free-as-in-beer, so it might be worth a try.

I don't use it, but I also understand that YourKit has a lower-overhead mode. I know several people who swear by it.

Brent said...

Jeremy: please blog extensively as soon as you open source that profiler--sounds exciting.

In the meanwhile, I will look into those other recommendations.

William Louth said...

I think call stack sampling can easily be replaced with a more efficient probe based (chained metering strategy) approach with much accurate hotspot detection along with the possibility of using addition meters/metrics which is not possible with call sampling in whatever form.

Jeremy Manson said...

@William - From that blog post, your methodology is highly unclear. You have to describe how you gather your data before I will believe your claims.

AsyncGetCallTrace avoids a lot of the overhead of traditional Java stack trace sampling mechanisms, and can profile with a greater degree of accuracy. You don't have to take my word for it - you can try it yourself.

Brent said...

Jeremy, do you happen to know if the Netbeans profiler uses AsyncGetCallTrace or not?

Apparently, the visualvm profiler in jdk 6u14 uses code from the latest NB profiler, so I would be curious to know if they use this technique or not. (And why not, if not.)

I did a google search just now, but could not find any good info.

Jeremy Manson said...

@Brent : I have no idea. It seems reasonably likely. I know that Sun Studio uses it. You could grep / objdump your way through the .so files looking for it.

Brent said...

Jeremy: I just skimmed thru my ...\jdk1.6.0_14\lib\visualvm directory.

There are no .so files. Also, everything seems to be pure java code inside jar files, or xml config files.

The one exception that looks like it might have something interesting is ...\jdk1.6.0_14\lib\visualvm\profiler3\lib\deployed\jdk16\windows\profilerinterface.dll

Opening that up in TextPad, I see that it does do some JVMTI stuff (JVMTI_ERROR_NONE seems to come up a lot). But did not see AsyncGetCallTrace. Maybe that is because I am not looking at the file right.

I think that the definitive answer is gonna have to come from someone else.

Jeremy Manson said...

That particular library seems to call out to Java_org_netbeans_lib_profiler_server_system_Stacks_getCurrentStackFrameIds to get its stack traces. The code:

Seems to call out to JVMTI's GetStackTrace function (i.e., not AGCT).

Brent said...

Having used visualvm's profiler several times now in the last couple of days, I too am pretty sure that it must not be using AGCT because it seems to have a massive impact on the performance of my program. Darn. Is there a reason why it is not using AGCT? Ignorance on the part of the developers? Or does AGCT have some other negative aspect (e.g. it is less accurate or something)?

Jeremy Manson said...

@Brent - The sense I have gotten from talking with these guys is twofold:

1) AGCT is just more broken. We've had to patch the VM a few times so that it didn't fall apart.

2) It is also more non-standard - my feeling is that the management team at Sun likes to have their code use the Java standards, and AGCT is not part of the JVMTI spec.

Brent said...

Jeremy: your profiler probably should NOT use regular time interval samples, but use random samples instead.

See this paper:

Nitsan Wakart said...

Thanks for a great post, I've read it a while back and stumbled upon it again today.
For present time readers (5 years later) it is worth mentioning that current Oracle Java versions come bundled with Java Flight Recorder and Java Mission Control[1] which offer low overhead AsyncGetCallTrace based profiling. OpenJDK does not include JFR.
Also worth mentioning is the active effort by Richard W. on Honest Profiler[2] which is an independent open source effort in creating a low overhead profiler based on AsyncGetCallTrace.

Jeremy Manson said...
This comment has been removed by the author.
Jeremy Manson said...

Thanks, Nitsan. I also have a proof of concept profiler here:

Anonymous said...

Hi Jeremy, thanks for your wonderful discussions on the subject along with the proof-of-concept profiler, which is easy to build and use. Do you have any suggestions on how to use this AGCT API without involving a signal handler (whether SIGPROF or others)? Mainly, is there a way to call this API from a thread to collect the stack trace on another interested thread?

Jeremy Manson said...

Hi! Thanks for the kind words.

I can think of several ways to call this API without involving a signal handler, but very few reasons to do so. If I wanted to call this function asynchronously on another thread, I'd probably send the other thread a signal with pthread_kill.

Different platforms may have different ways of invoking code asynchronously, of course. On OS X, for example, the profiler as written won't work, because SIGPROF always goes to the main thread. Various web searches can solve this problem.