This post is not about the memory model, but, instead, is about being able to get, asynchronously, the stack trace of a currently executing thread out of Sun's Java VM. Getting one synchronously is easy, of course.
I was trying to profile some Java applications under Linux, and I wanted to use SIGPROF and JVMTI to find out what code was running at any given second. This is just a posting to document how I did it, as I ended up using some undocumented JVM features, and I can't find a proper reference for them anywhere else.
More information as to why you might want to do this is here.
JVMTI is the Java Virtual Machine Toolkit Interface. It's a C/C++ interface to the virtual machine that allows you to hook in debuggers and profilers. More information can be found here.
SIGPROF is a POSIX signal. The way it is set up under Linux is that you can set a timer that goes off at intervals, and, whenever the timer expires, the SIGPROF signal is sent to the application. The idea is that you can then collect profiling information at fixed intervals.
The signal is caught and handled by a random running thread. That thread will be doing some task related to running the Java application -- whether that is executing user code, running garbage collection, or doing internal VM maintenance.
The problem is that there is no officially documented JVMTI hook that allows the user to find out exactly what the currently running thread is doing that is safe to run in a signal handler. The official way of getting a stack trace for the currently executing thread, the GetStackTrace function, isn't safe to be called in an asynchronous way -- if you try to read the stack when a timer expires, the Java stack could be in a corrupt state, or GC could be running, or any number of other things.
It turns out that the kind folks who wrote the Java virtual machine were fully aware of this, and provided an undocumented interface for this type of profiling, used by their Forte Analyzer (which now operates under the Sun Studio umbrella, I believe). Now that they've open-sourced the JVM, this is public knowledge. For those of you who like to see the source code for such things, it can be found in hotspot/src/share/prims/forte.cpp.
In principle, AsyncGetCallTrace is fairly easy to use. This is less true in practice. Since JVMPI has been removed in JDK6, you start by having to include JVMPI structures in your headers. In JDK5 and earlier, this step in unnecessary (all covered under the GPL):
Now that you have the JVMPI structures defined, you need a prototype for the undocumented call:
The
Since this post is too long already, I will assume that you know how to write both JVMTI and signal handlers. If you want me to write about that, leave a response, and I'll get to it at some point.
So, you write a bit of JVMTI that executes this method, and find out that although the method is invoked, it always returns -1 for the stack depth. It turns out that jmethodIDs (which are the internal numerical representation for a method in JVMTI / JVMPI) are allocated lazily. Most JVMTI methods call a method to GetAndAllocate these method IDs. AsyncGetCallTrace calls a getter, and relies on the jmethodIDs being pre-allocated. The expectation in the code is that if you have a ClassLoad hook in JVMTI, then the method ids will get allocated correctly.
So, you insert a ClassLoad Hook (all error correction code removed -- include your error correction code!):
I ran it after I did this, and found out that the stack depth was no longer -1, but the method ids were invalid. This seems to violate the contract of AsyncGetCallTrace; maybe one of my audience can tell me that this is wrong. Anyway, in order to "prime the pump" of method ids, I inserted another hook -- this time, to load up the method ids:
This is pretty dumb, and I'd love to get rid of it. Anyone more familiar with this interface should slap me upside the head and tell me how to do it properly.
Anyway, after you do this, AsyncGetCallTrace returns valid results. If the resulting stack depth is negative, it wasn't able to grab a valid Java stack. The most important negative result is -2, which indicates that you sampled in the middle of garbage collection.
I was trying to profile some Java applications under Linux, and I wanted to use SIGPROF and JVMTI to find out what code was running at any given second. This is just a posting to document how I did it, as I ended up using some undocumented JVM features, and I can't find a proper reference for them anywhere else.
More information as to why you might want to do this is here.
JVMTI is the Java Virtual Machine Toolkit Interface. It's a C/C++ interface to the virtual machine that allows you to hook in debuggers and profilers. More information can be found here.
SIGPROF is a POSIX signal. The way it is set up under Linux is that you can set a timer that goes off at intervals, and, whenever the timer expires, the SIGPROF signal is sent to the application. The idea is that you can then collect profiling information at fixed intervals.
The signal is caught and handled by a random running thread. That thread will be doing some task related to running the Java application -- whether that is executing user code, running garbage collection, or doing internal VM maintenance.
The problem is that there is no officially documented JVMTI hook that allows the user to find out exactly what the currently running thread is doing that is safe to run in a signal handler. The official way of getting a stack trace for the currently executing thread, the GetStackTrace function, isn't safe to be called in an asynchronous way -- if you try to read the stack when a timer expires, the Java stack could be in a corrupt state, or GC could be running, or any number of other things.
It turns out that the kind folks who wrote the Java virtual machine were fully aware of this, and provided an undocumented interface for this type of profiling, used by their Forte Analyzer (which now operates under the Sun Studio umbrella, I believe). Now that they've open-sourced the JVM, this is public knowledge. For those of you who like to see the source code for such things, it can be found in hotspot/src/share/prims/forte.cpp.
In principle, AsyncGetCallTrace is fairly easy to use. This is less true in practice. Since JVMPI has been removed in JDK6, you start by having to include JVMPI structures in your headers. In JDK5 and earlier, this step in unnecessary (all covered under the GPL):
typedef struct {
jint lineno;
jmethodID method_id;
} JVMPI_CallFrame;
typedef struct {
JNIEnv *env_id;
jint num_frames;
JVMPI_CallFrame *frames;
} JVMPI_CallTrace;
Now that you have the JVMPI structures defined, you need a prototype for the undocumented call:
extern "C"
void AsyncGetCallTrace(JVMPI_CallTrace *trace, jint depth,
void* ucontext)
__attribute__ ((weak));
The
__attribute__ ((weak))
is only for g++ users -- it tells the compiler not to look for AsyncGetCallTrace at compile or link time. People using other compilers can create a library stub that contains this method -- this is left as an exercise for the reader.Since this post is too long already, I will assume that you know how to write both JVMTI and signal handlers. If you want me to write about that, leave a response, and I'll get to it at some point.
So, you write a bit of JVMTI that executes this method, and find out that although the method is invoked, it always returns -1 for the stack depth. It turns out that jmethodIDs (which are the internal numerical representation for a method in JVMTI / JVMPI) are allocated lazily. Most JVMTI methods call a method to GetAndAllocate these method IDs. AsyncGetCallTrace calls a getter, and relies on the jmethodIDs being pre-allocated. The expectation in the code is that if you have a ClassLoad hook in JVMTI, then the method ids will get allocated correctly.
So, you insert a ClassLoad Hook (all error correction code removed -- include your error correction code!):
void JNICALL OnClassLoad(jvmtiEnv *jvmti_env,
JNIEnv* jni_env,
jthread thread,
jclass klass) {
}
...
jvmtiEnv *jvmti;
vm->GetEnv((void **)&jvmti, JVMTI_VERSION);
jvmtiEventCallbacks *callbacks =
new jvmtiEventCallbacks();
callbacks->ClassLoad = &OnClassLoad;
jvmti->SetEventCallbacks(callbacks,
sizeof(jvmtiEventCallbacks));
jvmti->SetEventNotificationMode(JVMTI_ENABLE,
JVMTI_EVENT_CLASS_LOAD, NULL);
I ran it after I did this, and found out that the stack depth was no longer -1, but the method ids were invalid. This seems to violate the contract of AsyncGetCallTrace; maybe one of my audience can tell me that this is wrong. Anyway, in order to "prime the pump" of method ids, I inserted another hook -- this time, to load up the method ids:
void JNICALL OnClassPrepare(jvmtiEnv *jvmti_env,
JNIEnv* jni_env,
jthread thread,
jclass klass) {
// We need to do this to "prime the pump",
// as it were -- make sure that all of the
// methodIDs have been initialized internally,
// for AsyncGetCallTrace. I imagine it slows
// down class loading a mite, but honestly,
// how fast does class loading have to be?
jint method_count;
jmethodID *methods;
jvmti_env->GetClassMethods(klass, &method_count,
&methods);
delete [] methods;
}
This is pretty dumb, and I'd love to get rid of it. Anyone more familiar with this interface should slap me upside the head and tell me how to do it properly.
Anyway, after you do this, AsyncGetCallTrace returns valid results. If the resulting stack depth is negative, it wasn't able to grab a valid Java stack. The most important negative result is -2, which indicates that you sampled in the middle of garbage collection.
Comments
The timer goes off and a signal is raised by the OS. From here on am confused between two things.
Does this signal gets translated into a Java exception accusing the currently executing thread's method for having raised it? and which in turn causes a JVMTI ExceptionEvent to occur, giving your agent the chance to obtain the stack trace of the faulting thread (which was the active thread)?
Or
If the above conjecture is false, would you elaborate on the role played by the signal handler and the events thereafter?
Specifically, if you installed a signal handler to handle the SIGPROF, on which thread is the handler executed (current thread?)? If it is on the current/active thread, how do you force an Java event so the JVMTI agent gets notified and gets a chance to do something.
Thanks and Regards,
Ranjit
I answered this in waaay too much detail in a post about using SIGPROF. The short version is that you load a signal handler in C++, and it gets executed when the timer goes off. Nothing happens in Java code at all.
Ranjit
What is the current state of the art? In particular, what is the best (accurate, yet low impact) Java profiler, in your experience?
I have used JIP
http://jiprof.sourceforge.net/
in the past, but it has terrible performance impact.
When I don't use that, I tend to use JProfiler, largely because it is available and handy at Google. It has a fairly enormous performance impact, though.
I don't know enough about it to recommend it, but I understand that the Sun Studio profiler uses AsyncGetCallTrace. As a result, it should have low overhead. I also believe that the binaries are free-as-in-beer, so it might be worth a try.
I don't use it, but I also understand that YourKit has a lower-overhead mode. I know several people who swear by it.
In the meanwhile, I will look into those other recommendations.
http://williamlouth.wordpress.com/2009/01/07/profiling-sampling-versus-execution-part-1/
http://williamlouth.wordpress.com/2009/01/16/profiling-sampling-versus-execution-part-2/
AsyncGetCallTrace avoids a lot of the overhead of traditional Java stack trace sampling mechanisms, and can profile with a greater degree of accuracy. You don't have to take my word for it - you can try it yourself.
Apparently, the visualvm profiler in jdk 6u14 uses code from the latest NB profiler, so I would be curious to know if they use this technique or not. (And why not, if not.)
I did a google search just now, but could not find any good info.
There are no .so files. Also, everything seems to be pure java code inside jar files, or xml config files.
The one exception that looks like it might have something interesting is ...\jdk1.6.0_14\lib\visualvm\profiler3\lib\deployed\jdk16\windows\profilerinterface.dll
Opening that up in TextPad, I see that it does do some JVMTI stuff (JVMTI_ERROR_NONE seems to come up a lot). But did not see AsyncGetCallTrace. Maybe that is because I am not looking at the file right.
I think that the definitive answer is gonna have to come from someone else.
http://visualvm.sourcearchive.com/documentation/0.20080728/Stacks_8c-source.html
Seems to call out to JVMTI's GetStackTrace function (i.e., not AGCT).
1) AGCT is just more broken. We've had to patch the VM a few times so that it didn't fall apart.
2) It is also more non-standard - my feeling is that the management team at Sun likes to have their code use the Java standards, and AGCT is not part of the JVMTI spec.
See this paper:
http://www.inf.usi.ch/faculty/hauswirth/publications/pldi10.pdf
For present time readers (5 years later) it is worth mentioning that current Oracle Java versions come bundled with Java Flight Recorder and Java Mission Control[1] which offer low overhead AsyncGetCallTrace based profiling. OpenJDK does not include JFR.
Also worth mentioning is the active effort by Richard W. on Honest Profiler[2] which is an independent open source effort in creating a low overhead profiler based on AsyncGetCallTrace.
1. http://www.oracle.com/technetwork/java/javaseproducts/mission-control/java-mission-control-1998576.html
2. https://github.com/RichardWarburton/honest-profiler
https://code.google.com/p/lightweight-java-profiler/
I can think of several ways to call this API without involving a signal handler, but very few reasons to do so. If I wanted to call this function asynchronously on another thread, I'd probably send the other thread a signal with pthread_kill.
Different platforms may have different ways of invoking code asynchronously, of course. On OS X, for example, the profiler as written won't work, because SIGPROF always goes to the main thread. Various web searches can solve this problem.