Wednesday, May 23, 2007

Double-Checked Locking and the Problem with Wikipedia

I love Wikipedia, I really do. I use it for everything. Well, not everything. For example, if I want a really good piece of chocolate cake, I tend to use flour, sugar, butter, eggs, chocolate, and not Wikipedia at all. For many things, however, Wikipedia cannot be surpassed.

Imagine, then, my surprise when I found out that their page on Double-Checked Locking gave an example of a version of double-checked locking that does not work.

// Broken -- Do Not Use!
class Foo {
  private Helper helper = null;
  private boolean initialized = false;
  public Helper getHelper() {
    if (!initialized) {
      synchronized(this) {
        if (!initialized) {
          helper = new Helper();
          initialized = true;
        }
      }
    }
  return helper;
}


(For background on Double-Checked Locking, see Bill Pugh's "Double-Checked Locking is Broken" Declaration.)

The problem with this code is exactly the same as the problem with all of the other variants of the double-checked locking idiom. You can reorder the assignment of initialized with the assignment to helper. As a result, another thread might see the value true for initialized and think that the helper field has been correctly initialized, even though it hasn't been!

I have since fixed the Wikipedia entry; the old entry can be found here.

Please folks -- stop trying to come up with fancy ways of avoiding volatile and synchronized annotations. Friends don't let friends avoid necessary synchronization.

Monday, May 14, 2007

Profiling with JVMTI/JVMPI, SIGPROF and AsyncGetCallTrace

This post is not about the memory model, but, instead, is about being able to get, asynchronously, the stack trace of a currently executing thread out of Sun's Java VM. Getting one synchronously is easy, of course.

I was trying to profile some Java applications under Linux, and I wanted to use SIGPROF and JVMTI to find out what code was running at any given second. This is just a posting to document how I did it, as I ended up using some undocumented JVM features, and I can't find a proper reference for them anywhere else.

More information as to why you might want to do this is here.

JVMTI is the Java Virtual Machine Toolkit Interface. It's a C/C++ interface to the virtual machine that allows you to hook in debuggers and profilers. More information can be found here.

SIGPROF is a POSIX signal. The way it is set up under Linux is that you can set a timer that goes off at intervals, and, whenever the timer expires, the SIGPROF signal is sent to the application. The idea is that you can then collect profiling information at fixed intervals.

The signal is caught and handled by a random running thread. That thread will be doing some task related to running the Java application -- whether that is executing user code, running garbage collection, or doing internal VM maintenance.

The problem is that there is no officially documented JVMTI hook that allows the user to find out exactly what the currently running thread is doing that is safe to run in a signal handler. The official way of getting a stack trace for the currently executing thread, the GetStackTrace function, isn't safe to be called in an asynchronous way -- if you try to read the stack when a timer expires, the Java stack could be in a corrupt state, or GC could be running, or any number of other things.

It turns out that the kind folks who wrote the Java virtual machine were fully aware of this, and provided an undocumented interface for this type of profiling, used by their Forte Analyzer (which now operates under the Sun Studio umbrella, I believe). Now that they've open-sourced the JVM, this is public knowledge. For those of you who like to see the source code for such things, it can be found in hotspot/src/share/prims/forte.cpp.

In principle, AsyncGetCallTrace is fairly easy to use. This is less true in practice. Since JVMPI has been removed in JDK6, you start by having to include JVMPI structures in your headers. In JDK5 and earlier, this step in unnecessary (all covered under the GPL):

typedef struct {
jint lineno;
jmethodID method_id;
} JVMPI_CallFrame;

typedef struct {
JNIEnv *env_id;
jint num_frames;
JVMPI_CallFrame *frames;
} JVMPI_CallTrace;

Now that you have the JVMPI structures defined, you need a prototype for the undocumented call:

extern "C"
void AsyncGetCallTrace(JVMPI_CallTrace *trace, jint depth,
void* ucontext)
__attribute__ ((weak));

The __attribute__ ((weak)) is only for g++ users -- it tells the compiler not to look for AsyncGetCallTrace at compile or link time. People using other compilers can create a library stub that contains this method -- this is left as an exercise for the reader.

Since this post is too long already, I will assume that you know how to write both JVMTI and signal handlers. If you want me to write about that, leave a response, and I'll get to it at some point.

So, you write a bit of JVMTI that executes this method, and find out that although the method is invoked, it always returns -1 for the stack depth. It turns out that jmethodIDs (which are the internal numerical representation for a method in JVMTI / JVMPI) are allocated lazily. Most JVMTI methods call a method to GetAndAllocate these method IDs. AsyncGetCallTrace calls a getter, and relies on the jmethodIDs being pre-allocated. The expectation in the code is that if you have a ClassLoad hook in JVMTI, then the method ids will get allocated correctly.

So, you insert a ClassLoad Hook (all error correction code removed -- include your error correction code!):

void JNICALL OnClassLoad(jvmtiEnv *jvmti_env,
JNIEnv* jni_env,
jthread thread,
jclass klass) {
}

...

jvmtiEnv *jvmti;
vm->GetEnv((void **)&jvmti, JVMTI_VERSION);
jvmtiEventCallbacks *callbacks =
new jvmtiEventCallbacks();
callbacks->ClassLoad = &OnClassLoad;
jvmti->SetEventCallbacks(callbacks,
sizeof(jvmtiEventCallbacks));
jvmti->SetEventNotificationMode(JVMTI_ENABLE,
JVMTI_EVENT_CLASS_LOAD, NULL);

I ran it after I did this, and found out that the stack depth was no longer -1, but the method ids were invalid. This seems to violate the contract of AsyncGetCallTrace; maybe one of my audience can tell me that this is wrong. Anyway, in order to "prime the pump" of method ids, I inserted another hook -- this time, to load up the method ids:

void JNICALL OnClassPrepare(jvmtiEnv *jvmti_env,
JNIEnv* jni_env,
jthread thread,
jclass klass) {
// We need to do this to "prime the pump",
// as it were -- make sure that all of the
// methodIDs have been initialized internally,
// for AsyncGetCallTrace. I imagine it slows
// down class loading a mite, but honestly,
// how fast does class loading have to be?
jint method_count;
jmethodID *methods;
jvmti_env->GetClassMethods(klass, &method_count,
&methods);
delete [] methods;
}

This is pretty dumb, and I'd love to get rid of it. Anyone more familiar with this interface should slap me upside the head and tell me how to do it properly.

Anyway, after you do this, AsyncGetCallTrace returns valid results. If the resulting stack depth is negative, it wasn't able to grab a valid Java stack. The most important negative result is -2, which indicates that you sampled in the middle of garbage collection.

Friday, May 11, 2007

Google Talks

By the way, gentle reader, in case you haven't noticed, I am running a series of talks at Google.

All of them can be found here. Updated July 2015: They seem to be available on YouTube, now that Google Video is a long forgotten service: They can now be found here.

Roach Motels and The Java Memory Model

I've been asked by a couple of people lately about what a compiler can do with code motion around synchronized blocks. The first person asked me about the following example, so let's run with it:

x = 1;
synchronized(o) {
z = z + 1;
}
y = 1;

I may have answered this already (at some point), but I feel free to repeat myself.

The JMM has what we call "roach motel" ordering. "Roach Motel" is Black Flag's name for their cockroach trap; in grad school, my apartment had lots and lots and lots of cockroaches, so I find it particularly salient (although I think it was probably Bill Pugh who came up with this association).

In the 1980s, the slogan for Black Flag Roach Motels was "Roaches check in, but they don't check out". The memory model is the same: accesses can be moved into synchronized blocks, but they can't be moved out. So, in this case, you can get:

x = 1;
synchronized(o) {
z = z + 1;
}
y = 1;

if we want to move the write to y early, this code transforms to:

x = 1;
synchronized(o) {
z = z + 1;
y = 1; // code motion
}

and transforms to:

x = 1;
synchronized(o) {
y = 1; // code motion
z = z + 1;
}

and can't be transformed any more, because the write to y has been moved into the
synchronized block, but can't move out.

Similarly, if we want to move the write to x down:

x = 1;
synchronized(o) {
z = z + 1;
}
y = 1;

transforms to:

synchronized(o) {
x = 1; // code motion
z = z + 1;
}
y = 1

transforms to:

synchronized(o) {
z = z + 1;
x = 1; // code motion
}
y = 1

.. but the write to x can't move any further.

At the end of these sorts of transformations, you can end up with something that looks like this:

synchronized(o){
y = 1;
z = z + 1;
x = 1;
}

... so the accesses to x, y and z can happen in any order, but once you move them into the synchronized block, you can't move them out again.

This is a tiny bit misleading, of course, because they can go out the direction they came in. But there is no difference between that and not having entered the synchronized block in the first place.

Now, if the compiler can remove the synchronized block altogether, then all the accesses can be moved about freely. This can happen, for example, if o is an object only accessed by a single thread -- the compiler will know that no other thread can lock on it, so it can feel free to remove the synchronization. That's a tale for another day, of course.

Tuesday, May 1, 2007

Software Transactional Memory Talk at Google

Adam Welc, from Intel's Programming Systems Lab, came to Google to give a talk about what they are doing in the realm of Software Transactional Memory. He also talks briefly, at the end, about Intel's hardware plans.

For those who are interested in such things, this is a talk worth seeing.

Click Here.