Monday, June 25, 2007

Signals and Java

Ooookay. You are now, presumably, fresh from reading the original post about using AsyncGetCallTrace and the more recent post about how to use SIGPROF in C++. The next obvious question is...

Can I Register a Signal Handler in Java?

Um... Kind of!

There is an undocumented and unsupported interface called sun.misc.Signal present in most Java implementations. For those of you who have downloaded the JDK, the files live in /src/share/classes/sun/misc.

If you import sun.misc.Signal and sun.misc.SignalHandler, you can define a signal handler like this:

// Handles SIGHUP
Signal.handle(new Signal("HUP"), new SignalHandler() {
// Signal handler method
public void handle(Signal signal) {
System.out.println("Got signal" + signal);
}
});

You can raise a signal like this:

Signal.raise(new Signal("HUP"));

You can even register a native signal handler with the Signal.handle0 method (I'll leave that as an exercise for the reader).

Having said that, none of this helps us set timers or do profiling, really. It's just an interesting aside.

More about profiling with SIGPROF

First, read this post about profiling with JVMTI and SIGPROF.

A poster on that entry asked me to elaborate a little on exactly what happens when the timer goes off and the signal is raised. In this post, I'll explain that in a little more detail. First, I'll back up a little.

Edited to add: Okay, I wrote this whole thing, and very, very briefly answered the actual question at the end. So, if you are interested in the answer to the actual question, and not a long discourse on how to use SIGPROF, look at the end of this entry.

What is a Signal?

A signal is a very basic method used either to indicate odd behavior in a program synchronously in UNIX-alike systems. If a process divides by zero, for example, a SIGFPE (Floating Point Exception) is synchronously sent to that process. Signals can also be used asynchronously; for example, one process can send another a signal. The other process determines how it wants to handle that signal when it receives it. That process can either register a signal handler to handle that signal, or let the default action occur.

On most modern UNIX systems, there are 32 signals. If you want to see the full list in a glibc system, see the header file /usr/include/bits/signum.h. They each map to a number; SIGKILL, for example, is number 9. You can use the command line program "kill" to send a signal to any process. If you send a SIGKILL to a program using "kill -9 ", for example, it will take the default action of terminating that process, unless it has registered a signal handler.

SIGPROF is the one we care about for the purposes of this post. It is used for sample profiling. UNIX systems can be set up to send this signal at a regular interval; the application can register a handler to record profiling information in the handler.

How Do I Register a SIGPROF Signal Handler?

On Linux, there are several steps involved in registering your own signal handler. The basic idea is that a sigaction structure needs to be passed to the sigaction system call. Warning: I am not making a pretense that this is complete. Many man pages should be studied before attempting this. On Linux, a sigaction looks like this:

struct sigaction {
void (*sa_handler)(int);
void (*sa_sigaction)(int, siginfo_t *, void *);
sigset_t sa_mask;
int sa_flags;
void (*sa_restorer)(void);
}

The first field, sa_handler, is the handler that you want to define. We're only going to fill in that one — if you are interested in the others, then look at the sigaction man page.

You define the handler by declaring a void function that takes a single integer argument:

void handler(int signal) {
printf("received signal %d", signal);
}

You then define your sigaction struct, and pass it to sigaction:

struct sigaction sa;
sa.sa_handler = &handler;
sigemptyset(&sa.sa_mask); // Oh, just look it up.
struct sigaction old_handler;
sigaction(SIGPROF, &sa, &old_handler);

If there was already a registered handler, it will be in old_handler. You then set a timer to go off at a specified interval, and send your process a SIGPROF:

static struct itimerval timer;
timer.it_interval.tv_sec = 1; // number of seconds is obviously up to you
timer.it_interval.tv_usec = 0; // as is number of microseconds.
timer.it_value = timer.it_interval;
setitimer(ITIMER_PROF, &timer, NULL);

setitimer tells the system to deliver a SIGPROF at the interval specified by the timer. The system will wait for the timer to go off, deliver a SIGPROF, and repeat until the process dies or the itimer is replaced.

At this point, your signal handler is complete!

Or...?

Well, it turns out that it isn't that simple (and bear in mind that we haven't even gotten to the Java part, yet). There is the issue of threading. The signal is handled in-process. So which thread actually handles it? Well, it turns out that this is not well-specified.

In Linux 2.4 and earlier, each thread was a separate process. Because of this, you need to set up a separate timer for every thread that you want to be profiled. In other words, it is more-or-less completely unusable unless you have a very tightly controlled environment.

In Linux 2.6, a randomly chosen, currently executing thread is picked to execute the signal handler. This is awfully useful, considering that what we want to do is find out what the currently executing threads are doing.

There are a lot of pitfalls here, so proceed with caution. For example, in C code, you can't really grab an arbitrary lock in a signal handler, because the thread executing might already have that lock; because most C lock implementation aren't reentrant, this can cause serious problems.

There is also the issue of what happens if you only want some threads to be able to handle the signal, and you want to disable it for the rest. They invented pthread_sigmask() for this purpose. If you want to disable the handling of a signal altogether, you can use sigprocmask().

What's the point of all of this?

The original question asked me what exactly happened when the timer went off and the signal got raised by the OS. He thought that perhaps it raised a Java exception. It doesn't. It just delivers the signal to the process. So I wrote a C++ handler that called AsyncGetCallTrace, and loaded it in the Agent_OnLoad part of a JVMTI agent.

Saturday, June 9, 2007

Answer to Weekend Multithreaded Puzzler

This is the answer to the riddle posed in my earlier puzzler posting. You should probably look at that question before looking at this answer.

I suppose I should call it something other than a puzzler, to avoid getting hit by Josh and Neal's angry team of vicious, snarling lawyers...

This program certainly looks straightforward. It just looks as if two threads are writing to two variables. In fact, you probably expected me to say something "who-cares" oriented about compiler optimizations at this point. Well, they are volatile variables, so you can worry a lot less about potential reorderings. In fact, this one has absolutely nothing to do with program transformations, and, if you ran the program on my laptop, you found that it hangs!

It turns out that this is the result of one of those vicious little static initialization quirks that provide so many hours of headaches. What happens is something very like the following. The first thread encounters A; this class has not been initialized, so that thread tries to initialize it. When you try to initialize a class, you acquire a lock on that class, so that no one else can initialize it at the same time. Are you starting to see where this could lead to problems?

At the same time as this, the second thread encounters B, and so acquires the lock on the B class. It then runs the static initializer for B, which encounters A. "Wait!" it says -- "A hasn't been initialized! Better acquire that initialization lock..." It tries to acquire the lock, but the first thread already has it, so it waits for the first thread to finish.

Meanwhile, the same process goes on in the first thread. It runs the static initializer for A, which encounters B. "Wait!" it says -- "B hasn't been initialized! Better acquire that initialization lock..." It tries to acquire the lock, but the second thread already has it, so it waits for the second thread to finish.

Result: Both threads wait forever. Deadlock!

This whole process is scheduling / hardware / OS / JVM dependent, of course. If the first thread runs to completion without letting the second thread start, then it will quite happily initialize both A and B without the other thread acquiring any locks. This will avoid deadlock nicely. This seems to happen on Linux, but not OS X.

How do you avoid this? Well, that's a little tricky. In this case, you would probably rewrite the code so that it doesn't perform the initialization in two separate threads. That's not always a general-purpose solution, though. Your best bet, in general, is to avoid having circularities like this in your static initializers. As with constructors, it is important to keep your static initializers as simple as possible.

Keeping it simple might mean not doing anything that might trigger subsequent static initialization. That's a good first option, if you can manage it, but it is not always possible.

The second option is to make sure that you have an order over your classes. For example, if you have three classes, A, B and C, you could structure them so that C can refer to A and B in its static initializer, B can only refer to A, and A can only refer to itself. This will prevent deadlocks by enforcing a strict order over when the implicit locks can be acquired.

The final option -- if you know this will be a problem -- is to make sure that the initialization can only happen in a single thread. This may mean having to force it to occur earlier, by referencing the class earlier than it would otherwise have been referenced.

I feel like there should be a moral. The moral is that static initialization has all sorts of hidden gotchas, and you should look out for it.

Weekend Multithreaded Puzzler Fun!

Although I didn't go this year, one of my favorite parts of JavaOne is always Josh Bloch and Neal Gafter's talk on Java Puzzlers. (This year, Bill Pugh, my graduate advisor, stepped in for Neal.) A puzzler is a short snippet of code which has unexpected results, usually because some language or API feature behaves in a strange way. I enjoy them because I always think it is truly wonderful to have the depth of my ignorance exposed.

Josh and Neal wrote an excellent book with all of the Java Puzzlers through 2005, which is highly recommended, and occupies a place of honor in the stack of books in my bathroom.

The point of all of this is that occasionally, I will send Josh a multithreaded puzzler, and he will tell me it is no good, because you can't reproduce it every time. Here's one I sent him a couple of months ago.

It turns out that the following snippet of code displays the odd behavior 100% of the time under the JVM on my MacBook Pro (JDK 1.5.0_07), and won't display it at all on Linux. I haven't tried Windows. Can you figure out what the odd behavior will be? If you have an Intel-based Mac, you can probably even reproduce it.


class A {
volatile static int x = 1;
static {
B.y = 1;
}
}

class B {
volatile static int y = 2;
static {
A.x = 2;
}
}

public class Test {
public static void main(String [] args) {
Thread t1, t2;
(t1 = new Thread() {
public void run() {
A.x = 1;
}
}).start();
(t2 = new Thread () {
public void run() {
B.y = 2;
}
}).start();
try {
t1.join(); t2.join();
} catch (InterruptedException e) {}
}

}

Monday, June 4, 2007

More thoughts on SIGPROF, JVMTI and stack traces

This is a follow up from my previous post about profiling with SIGPROF and JVMTI.

There are, in fact, a very large number of ways of getting stack traces out of a JVM.

  • If you send a system-dependent signal to your JVM process, it will spit out a stack dump of every currently live thread. On Solaris and Linux, the signal is a SIGQUIT, which you can get by using kill -3 on the JVM PID (or the PID of the parent JVM process under Linux 2.4), or hitting Control-\. On Windows, you can achieve the same effect by hitting Ctrl-Break on Windows.

  • If you call Thread.dumpStack(), or create a new Throwable, and invoke getStackTrace() on it, you can get the current stack trace programmatically.

  • If you use ThreadMXBean's getThreadInfo methods, you can get stack traces out of any threads you want.

  • If you use JVMTI's GetStackTrace or GetAllStackTraces methods, you can get stack trace information in native code.


Most of these methods will tell you if your thread can be scheduled by the JVM or Operating System. This information will be reported as part of the thread info -- the thread will be described as "Runnable".

However, there is a big difference between "Runnable" and "Running". If you send a SIGPROF to your JVM and use AsyncGetCallTrace, you find out exactly what your JVM is doing at precisely the moment you sent the signal.

The difference here is fundamentally that all of those other methods tell you what the JVM could be doing, and this one tells you what it is doing. It will even tell you if it is performing garbage collection. This sort of information can be invaluable when you want to know what is soaking up your CPU cycles.

C++ Threads

There is a very good talk by Lawrence Crowl on the upcoming threading changes to C++. I wrote a brief entry about his talk on C++0x (where they are hoping for x < 10). They have developed heavily on the work done for the Java model, so that they could resolve some of the C++ absurdities that inevitably occur. Hans Boehm, who was heavily involved in the Java effort, has been leading the effort.

One neat feature is the proposed atomic keyword. All accesses to a variable declared atomic will be, obviously enough, atomic. It will support features like compare-and-swap and atomic increment (of numerical types). The neat part is that this will work for more than just scalar types (as it does in most current systems). You can declare an entire object to be atomic, and update it all at once. Efficiency depends, of course, on whether the hardware supports such operations, or they need to be emulated in software.

As this is C++, they felt the need to overload operators for atomic support. For example, if you have an atomic int v, then code that reads v++ performs an atomic increment. This is reasonably intuitive, and has been the source of confusion for some Java programmers with volatile variables.

The problem is that in order to support this, they have to start having some really messy details. For example, the semantics of the assignment operator (=) usually involve a load followed by a store, and for the operator to return whatever the result of evaluating the RHS was. This makes assignment a two-step process.

Why is this tricky? Let's say we have two atomic integers, a and b. If you say something like a += 4, you simply perform a machine-level atomic increment of a by 4, and it works trivially. On the other hand, if you have a = b, then you would have to assign the value of b to a without letting the value of b change while you are changing a. This is not supported by most architectures. So, they allow overloading of the assignment operator, but only if there is no atomic variable on the RHS of the equation. How ugly is that?

There are a lot of other interesting details in the talk. It is definitely recommended.