Tuesday, July 7, 2009

How Hotspot Decides to Clear SoftReferences

I got asked about this twice in one day, and I didn't know the answer, so I sat down and puzzled it out a bit.

A SoftReference is a reference that the garbage collector can decide to clear if it is the only reference left to an object, and the GC decides (through some undefined process) that there is enough memory pressure to warrant clearing it. SoftReferences are generally used to implement memory-sensitive caches. A mistake many people make is to confuse it with a WeakReference, which is a reference that, if it is the only reference left to the object, the garbage collector will aggressively clear.

I got asked how Hotspot's GC actually decides whether to clear SoftReferences. Twice. On the same day. (Hotspot is the Sun / OpenJDK JVM). The answer was that I had no idea, so I had to go look it up. Here is what I found out; I hope it is a useful reference for someone (pun intended).

First, there is a global clock variable that is set with the current time (in millis) every time a garbage collection occurs. Every SoftReference has a timestamp field that is set to the current value of clock when it is accessed (when it is constructed or the get()) method is called. This gives a very coarse ordering over the SoftReferences; the timestamp indicates the last GC before they were accessed.

When a garbage collection occurs, the decision to clear a SoftReference is based on two factors:
  1. How old the reference's timestamp is, and
  2. How much free space there is in memory.
The calculation is pretty simple. If:
  • free_heap is the amount of free heap space in MB,
  • interval is the time between the last GC's clock and the timestamp of the ref we are currently examining, and
  • ms_per_mb is a constant number of milliseconds to keep around a SoftReference for each free megabyte in the heap
Then the decision is made by:
interval <= free_heap * ms_per_mb
To take an example, let's say that we have a SoftReference with a timestamp of 3000ms, the last GC's clock time was 5000ms, the ms_per_mb is 1000 and the free space is 1MB. We then test whether:
5000 - 2000 <= 1 * 1000
This is false (3000 > 1000), so we clear the reference.

Now let's say there is more free space — say, 4MB — and so less reason to clear the SoftReferences. The calculation is now
5000 - 2000 <= 4 * 1000
This is true (3000 <= 4000), so we don't clear the reference.

One thing to notice about this is that it implies that SoftReferences will always be kept for at least one GC after their last access. Why is that? Well, for the interval, we are using the clock value of the last garbage collection, not the current one. As a result, if a SoftReference has been accessed since the last garbage collection, it will have the same timestamp as that garbage collection, and the interval will be 0. 0 <= free_heap * 1000 for any amount of free_heap, so any SoftReference accessed since the last garbage collection is guaranteed to be kept. This is actually how this question came up; some of my colleagues notices their SoftReferences weren't being cleared during a garbage collection, and they didn't know why.

ETA: The above paragraph originally said "one Full GC", not "one GC". In our case, it was one full GC, because the objects being allocated were too big to be allocated in the young generation. In most cases, it will be one GC of the generation in which the object was allocated, which will usually be the new generation / TLAB.

Another thing to notice is that the value of 1000 for ms_per_mb is fairly arbitrary. It can be adjusted with the JVM flag -XX:SoftRefLRUPolicyMSPerMB=n. If you adjust it down, then free_heap * ms_per_mb will be smaller, and so SoftReferences are more likely to be cleared. If you adjust it up, you get the opposite effect.

Wednesday, June 17, 2009

Volatile Arrays in Java

I get asked a lot about how the volatile keyword interacts with arrays, so it is probably worth a blog post on the subject.

Those of you who have read my posts on volatile (Volatile Fields and Synchronization, Volatile Does Not Mean Atomic and, most importantly, What Volatile Means in Java) will have a pretty good idea of what volatile means, but it is probably worth it to provide a reminder.

Basically, if you write to a volatile field, and then you have a later read that sees that write, then the actions that happened before that write are guaranteed to be ordered before and visible to the actions that happen after the read. In practice, what this means is that the compiler and the processor can't do any sneaky reordering to move actions that come before the write to after it, or actions that come after the write to before it. See my post on What Volatile Means in Java for more detail.

With that out of the way, let's go through some examples of what you can do with volatile arrays:
volatile int [] arr = new int[SIZE];

arr = arr;
int x = arr[0];
arr[0] = 1;
The first lesson to learn, which will guide us here, is that arr is a volatile reference to an array, not a reference to an array of volatile elements! As a result, the write to arr[0] is not a volatile write. If you write to arr[0], you will get none of the ordering or visibility guarantees that you want from a volatile write.

What examples are there of a volatile write in the code above? Well, both of the writes to arr — the self-assignment and the write of new int[SIZE] — are volatile writes, because they are writing to arr, not one of its elements.

That explains where the volatile writes are. Where are the volatile reads in our example? It turns out that each of the lines after the declaration contains a volatile read:
arr = arr
This one is easy. The volatile read is on the right hand side of the assignment statement.
int x = arr[0];
This one is slightly more subtle. The volatile read is not the read of the array element. The right hand side of that assignment is a two step process. First, you read the array reference, then you read the 0th element of that array. The volatile read is the read of the array reference, not the read of the 0th element.
arr[0] = 1;
The previous example should give you a hint of where the volatile read is on this line. As in that example, the left-hand side is a two step process. First, you read the array reference, then you assign to the 0th element of that array. As odd as it seems, the read of the array reference is a volatile read.

The astute reader will notice that there is no actual way to get volatile write semantics by writing to an element of an array referenced by a volatile field. The easiest way to get volatile array semantics is to use the Atomic[Reference/Long/Integer]Array classes in java.util.concurrent.atomic, which provide volatile semantics for reads and writes of individual elements.

(Why not Float/Short/Double Array? With APIs, you never ask "why not", you ask "why". Meanwhile, you have 32- and 64-bit bit patterns, so the Float.floatToIntBits and Float.intBitsToFloat family of functions are your friends.)

These classes are somewhat problematic, though. If nothing else, you are endlessly boxing and unboxing values, which may make access expensive. Ugh — I really do know better than this, really! As a result, there is more to this story.

You may have noticed that I did provide a way of getting a volatile write above with just arrays: by writing out a self-reference. I have been asked if that technique can be leveraged to provide volatile access to array elements. Here's what that would look like:
// I wouldn't do this if I were you.
volatile int [] arr = new int[SIZE];

arr[0] = 1;
arr = arr;
This definitely does provide the volatile write. However, what good does it do you? The virtue of a volatile write is that a corresponding read can detect that it happened, and do something based on the new value. For example, you can use a volatile flag to force one thread to loop indefinitely until another one sets the flag. In this case, you can't actually detect that another thread performed the write, because it is writing the same value to the variable.

You can use sun.misc.Unsafe to emulate the behavior of a volatile array. But not everyone is working on a Sun VM. And they are trying to discourage the adoption of sun.misc.Unsafe, so I'm not going to put the code here.

Don't despair too much, though — Doug Lea and the clever folks involved in JSR-166 are working on better solutions for Java 7. More news as events warrant.

Tuesday, June 9, 2009

Welcome!

Mailinator's Paul Tyma linked to me. If you are following from that link, the relevant blog entry you are looking for is probably this one, specifically, the entry labeled "visibility".

Saturday, April 4, 2009

Faster Logging with Faster Logger Classes

Today, I'll discuss a little tweak I made to java.util.Logging that made my logging throughput double. I want to use it as an illustration that it often isn't very difficult to improve the performance of concurrent code by doing things that are actually pretty easy to do.

So, "I" have an application that is running a couple of hundred threads on an 8-core machine, and it wants to log about 2MB a second using java.util.Logger. When I say "I have", I actually mean "someone else has", because if "I" had to log a megabyte a second, there is no way I would use java.util.Logger to do it. Still, we all make our choices.

When I came to this code, it was already doing sensible things like buffering its output. It wasn't doing something more complicated, like using a dedicated logging thread. It could log about 1MB a second, and was chewing through CPU pretty rapidly. I just decided to run our profiler on the code and see where the bottlenecks were. Our profiler is based on the undocumented AsyncGetCallTrace profiling call in HotSpot, and can actually profile your code without interfering with its performance characteristics. This is nice, because you don't end up optimizing the wrong things. But I digress.

Anyway, the profiler showed that we were spending a LOT of time in Logger.log(), on lines that look fairly harmless, but aren't:

public void log(LogRecord record) {
if (record.getLevel().intValue() < levelValue || levelValue == offValue) {
return;
}
synchronized (this) {
if (filter != null && !filter.isLoggable(record)) {
return;
}
}

// Post the LogRecord to all our Handlers, and then to
// our parents' handlers, all the way up the tree.

Logger logger = this;
while (logger != null) {
Handler targets[] = logger.getHandlers();

if (targets != null) {
for (int i = 0; i < targets.length; i++) {
targets[i].publish(record);
}
}

if (!logger.getUseParentHandlers()) {
break;
}

logger = logger.getParent();
}
}
There are no fewer than four locks being acquired in this method. There is the obvious call to synchronized(this). logger.getHandlers() is also a synchronized method. getUseParentHandlers() is synchronized. getParent() is also synchronized. Acquiring and releasing these locks was killing our throughput!

It turns out that there are some very simple things you can do to eliminate the locks in this code without exposing yourself to any correctness issues:

  1. The filter field is protected by the lock on this. That lock is really only protecting one field; the lock is held while one line of code writes to it, and one line of code reads from it. If you want to make something that does the same thing (from the perspective of concurrency), you can make that field into a volatile. If you are worried about the filter variable changing in between when you read it and when you write it, you can read it to a local variable first:
    Filter theFilter = filter;  // filter is volatile
    if (theFilter != null && !theFilter.isLoggable(record)) {
    return;
    }
  2. There is a global lock protecting the handlers. It turns out that the only thing this is protecting is a rarely-updated array of Handlers. If you have a rarely updated concurrent array, you should be using the non-blocking CopyOnWriteArrayList instead.

  3. getUseParentHandlers() is synchronized for the exact same reason as the filter, and can be replaced with a volatile in the same way.


These are all pretty simple improvements, but they doubled my throughput. The changes are going to be incorporated into JDK7, and are, in fact, already in the downloads you can get from OpenJDK.

I should rush to say that I don't blame the original author for not making these changes; java.util.logger was added in JDK 1.4, before the addition of java.util.concurrent and before a rigorous definition of volatile. Plus, it really isn't designed for throughput.

Why am I blogging about pretty simple improvements? There are a few simple morals here:

  1. Learn your libraries. Don't have a synchronized array that is almost never updated, for example.

  2. Know when you can use tricky language features. Knowing that volatile can be used for simple, single-variable updates is a very useful thing to know. I've written about volatile frequently, go read those posts.

  3. It is actually worth it to participate in OpenJDK. We (or I, at least) have a tendency to assume that JDK code is really high quality and well-optimized. This is probably true in a lot of areas (java.util.concurrent, or java.lang.MostStuff, or java.util.YourFavoriteDataStructure), but there is still plenty of work to be done. If you get involved, you not only help yourself, but you also help everyone.

  4. I don't know, low-hanging fruit == good?

There is, by the way, a very large section of the definitely-recommended book Java Concurrency in Practice (lgt amazon) devoted to clever things you can do to speed up your multithreaded logging.

Saturday, February 28, 2009

Small Language Changes for JDK7

For those who haven't been paying attention, there is a new project from Sun to attract proposals for small language changes to JDK7. Even in the first day of open calls for proposals, the entries have been very interesting:
  • Strings in Switch, a proposal from Joe Darcy to be able to use Strings in switch statements.

  • Automated Resource Blocks, a proposal from Josh Bloch to be able to say things like:

    try (BufferedReader br = new BufferedReader(new FileReader(path)) {
    return br.readLine();
    }
    instead of:

    BufferedReader br = new BufferedReader(new FileReader(path));
    try {
    return br.readLine();
    } finally {
    br.close();
    }
    based on having BufferedReader implement a Disposable interface.

  • Block expressions, a proposal from Neal Gafter to be allow things like:
    double pi2 = (double pi = Math.PI ; pi*pi)**; // pi squared*

    Basically, the principle here is that the (parenthesis-delimited) block would return a value.

  • Exception handling improvements, another proposal from Neal Gafter to allow things like:
    try {
    doWork(file);
    } catch (final IOException | SQLException ex) {
    logger.log(ex);
    throw ex;
    }

  • Improved Type Inference, a proposal from me (although I can't claim credit for the idea) to be able to replace things like this:
    Map<String, List<String>> anagrams = new HashMap<String, List<String>>();
    with things like this:
    Map<String, List<String>> anagrams = new HashMap<>();

  • A new syntax for wildcard variance, again from the prolific Neal Gafter, allowing the user to replace "? extends" with "out" and "? super" with "in". (Can you tell that Neal's been working on C#?)

I don't love all of these proposals, but I do love some of them, and I think it is great to see that the pent-up demand for work on the Java language finally has an outlet.

(For those of you following the various Closures proposals: Closures don't count as a small change, and are therefore not in scope for this JSR.)

ETA: The "use Scala instead" and "where is BGGA?" comments have been made, so please read the comments and feedback before reiterating them.

Sunday, December 14, 2008

Date-Race-Ful Lazy Initialization for Performance

I was asked a question about benign data races in Java this week, so I thought I would take the opportunity to discuss one of the (only) approved patterns for benign races. So, at the risk of encouraging bad behavior (don't use data races in your code!), I will discuss the canonical example of "benign races for performance improvement". Also, I'll put in another plug for Josh Bloch's new revision of Effective Java (lgt amazon), which I continue to recommend.

As a reminder, basically, a data race is when you have one (or more) writes, and potentially some reads; they are all to the same memory location; they can happen at the same time; and that there is nothing in the program to prevent it. This is different from a race condition, which is when you just don't know the order in which two actions are going to occur. I've put more discussion of what a data race actually is at the bottom of this post.

A lot of people think that it is okay to have a data races all over your code. By and large, it is a lot more dangerous than those people think it is. For example, the version of double checked locking that doesn't use volatile has a data race in it. As I've discussed before, that version is broken.

So, having said that, when can you have a harmless data race in your code? Here's some code from Sun's implementation of java.lang.String, which is available under the GPL:

public final class String {
/** The value is used for character storage. */
private final char value[];

/** The offset is the first index of the storage that is used. */
private final int offset;

/** The count is the number of characters in the String. */
private final int count;

/** Cache the hash code for the string */
private int hash; // Default to 0

public int hashCode() {
int h = hash;
if (h == 0) {
int off = offset;
char val[] = value;
int len = count;

for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}

// ...
}
The principle here is pretty simple. The program calls hashCode(). If it hasn't calculated the hash code yet, then it does, and then stores the value for future use. The people who wrote String didn't bother to calculate the hash code in the constructor, because that would make construction (which is done all the time) more expensive. Because Strings are immutable, we don't need to worry about the hash code potentially changing in the future.

Do you see the potential data race? Here's some code that triggers it:

class BenignDataRace {
final String s = "blah";

public void main() {
class CalcHash implements Runnable {
public void run() {
int h = s.hashCode();
}
}
Thread t = new Thread(new CalcHash());
Thread t2 = new Thread(new CalcHash());

t.start();
t2.start();
}

}
Two threads access s.hash without doing synchronization, and at least one of those threads is going to write to it. Item 71 in Effective Javacalls this idiom racy single-check. In this case, though, it actually works. Why is this safe, when the double-checked locking example, which uses synchronization, isn't?

There are basically two parts to the answer, and they are both pretty simple:

  1. In this case, we don't care how many times we actually calculate the answer and assign it to the hash field. It just isn't a big deal if we happen to calculate it multiple times. Furthermore,

  2. Java has a special guarantee for fields that are 32-bit or fewer and object references — a thread will always see some value that was assigned to that field. As a result of this, a thread reading hash will either see 0 or the calculated value.

That second item might sound obvious, but bear in mind that 64-bit values don't get that guarantee; this code would not work if hash were a long or a double.

Now, let's break the code:

public int hashCode() {
if (hash == 0) {
int off = offset;
char val[] = value;
int len = count;

int h = 0;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return hash;
}
All I did was make it so that the temporary variable was only used for calculating the hash code! How could that break the code?

What I've done here is to add an additional read: the second read of hash, before the return. As odd as it sounds, and as unlikely as it is to happen, the first read can return the correctly computed hash value, and the second read can return 0! This is allowed under the memory model because the model allows extensive reordering of operations. The second read can actually be moved, in your code, so that your processor does it before the first!

(Side note: the first version is better anyway, because it performs fewer memory reads. If you really care that much about performance, that's the way to go.)

What's the moral here? The moral is that this stuff is very confusing, and you have to work very hard to get it right. Make sure you understand exactly what your code does. Avoid data races wherever possible.




Here are the details on data races that I promised above:

A program has a data race in it when it is possible to have an execution of that program where:

  1. All of the instructions in that program occur in the order they occurred in the program (this is important for technical reasons),

  2. There are two accesses to the same field / array element,

  3. At least one of those accesses is a write, and

  4. There is no happens-before ordering between those actions.


I defined "happens-before ordering" in my last post. Basically, it means that there is nothing establishing an order between the two actions; for example, there is no locking protecting the accesses, no volatile writes and reads intervening and no thread starts and joins.

Saturday, November 15, 2008

What Volatile Means in Java

Today, I'm going to talk about what volatile means in Java. I've sort-of covered this in other posts, such as my posting on the ++ operator, my post on double-checked locking and the like, but I've never really addressed it directly.

First, you have to understand a little something about the Java memory model. I've struggled a bit over the years to explain it briefly and well. As of today, the best way I can think of to describe it is if you imagine it this way:

  • Each thread in Java takes place in a separate memory space (this is clearly untrue, so bear with me on this one).

  • You need to use special mechanisms to guarantee that communication happens between these threads, as you would on a message passing system.

  • Memory writes that happen in one thread can "leak through" and be seen by another thread, but this is by no means guaranteed. Without explicit communication, you can't guarantee which writes get seen by other threads, or even the order in which they get seen.

The Java volatile modifier is an example of a special mechanism to guarantee that communication happens between threads. When one thread writes to a volatile variable, and another thread sees that write, the first thread is telling the second about all of the contents of memory up until it performed the write to that volatile variable.

At this point, I usually rely on a visual aid, which we call the "two cones" diagram, but which my officemate insists on calling the "two trapezoids" diagram, because he is picky. ready is a volatile boolean variable initialized to false, and answer is a non-volatile int variable initialized to 0.




The first thread writes to ready, which is going to be the sender side of the communications. The second thread reads from ready and sees the value the first thread wrote to it. It therefore becomes a receiver. Because this communication occurs, all of the memory contents seen by Thread 1, before it wrote to ready, must be visible to Thread 2, after it reads the value true for ready.

This guarantees that Thread 2 will print "42", if it prints anything at all.

If ready were not volatile, what would happen? Well, there wouldn't be anything explicitly communicating the values known by Thread 1 to Thread 2. As I pointed out before, the value written to the (now non-volatile) ready could "leak through" to Thread 2, so Thread 2 might see ready as true. However, the value for answer might not leak through. If the value for ready does leak through, and the value for answer doesn't leak through, then this execution will print out 0.

We call the communications points "happens-before" relationships, in the language of the Java memory model.

(Minor niggle: The read of ready doesn't just ensure that Thread 2 sees the contents of memory of Thread 1 up until it wrote to ready, it also ensures that Thread 2 sees the contents of memory of any other thread that wrote to ready up until that point.)




With this in mind, let's look at the Double-Checked Locking example again. To refresh your memory, it goes like this:

class Foo {
private volatile Helper helper = null;
public Helper getHelper() {
if (helper == null) {
synchronized(this) {
if (helper == null) {
helper = new Helper();
}
}
}
return helper;
}
The object of the double-checked locking pattern is to avoid synchronization when reading a lazily constructed singleton that is shared between threads. If you have already constructed the object, the helper field will not be null, so you won't have to perform the synchronization.

However, this is only part of the solution. If one thread creates the object, it has to communicate the contents of its memory to another thread. Otherwise, the object will just sit in the first thread's memory. How do we communicate the contents of memory to another thread? Well, we can use volatile variables. That's why helper has to be volatile -- so that other threads see the fully constructed object.

Locking in Java also forms these "happens-before" communication points. An unlock is the sender side, and a lock on the same variable is the receiver side. The reason that doesn't work for (non-volatile) double-checked locking is that only the writing thread ever performs the locking. The whole point of the idiom is that the reader side doesn't do the locking. Without the explicit communication in the form of the volatile variable, the reading thread will never see the update performed by the writer thread.

Let me know if that makes sense.