Monday, July 14, 2008

Immutability in Java, Part 2

I'd like to talk a little more about what it takes to ensure thread-safe immutability in Java, following on from a (semi)recent post I made on the subject.

The basic gist of that post was that if you make data immutable, then they can be shared between threads without additional synchronization. I call this "thread-safe immutability". In that post, I said this:

Now, in common parlance, immutability means "does not change". Immutability doesn't mean "does not change" in Java. It means "is transitively reachable from a final field, has not changed since the final field was set, and a reference to the object containing the final field did not escape the constructor".

I just wanted to go over these points again, because I get a lot of questions about them and there are a couple of things I glossed over. Let's take them one by one. Immutability means...
  1. the object is transitively reachable from a final field

    What this means is that you can reach the object or value by following references from the final field. Let's say we're making an immutable String class, like the one in the Java libraries. We might have something like this (notice I'm not checking for null pointers in the example code. Go ahead, make me care.):
    class String {
    private final byte[] bytes;
    public String(byte[] value) {
    byte[] newArray = new byte[value.length];
    System.arraycopy(value, 0, newArray,
    0, value.length);
    bytes = newArray;
    }
    }
    The thing we are storing in our String is a final reference to the byte array, not the array itself. The contents of the byte array can be changed, according to the semantics of Java, but what is important, for the purposes of thread-safety, is that they are reachable by following a reference from the final field.

    ETA: So, apparently, I wasn't clear enough on this. Let's try again.

    Transitively reachable means that you can get from the final field to the data you want to be immutable by following zero or more references. So, if you had:
    class String {
    final byte[] bytes;
    }
    The array of bytes is reachable from the bytes field. If you have:
    class String {
    static class BytesHolder {
    byte[] bytes;
    }
    final BytesHolder holder;
    }
    The array of bytes is still transitively reachable from the bytes field. In fact, you can add as many layers of indirection as you want — as long as you can get there by following references from the field, it is still transitively reachable from the field.

  2. the object has not changed since the final field was set

    This one is simple, right? Here's the String class again, but the bytes are no longer immutable:
    class String {
    private final byte[] bytes;
    public String(byte[] value) {
    byte[] newArray = new byte[value.length];
    System.arraycopy(value, 0, newArray,
    0, value.length);
    bytes = newArray;
    }
    public setZerothElt(byte b) {
    byte[0] = b;
    }
    }
    Well, now, obviously that's not thread-safe immutable. If two threads call setZerothElt(), then all sorts of bad thread horseplay can happen.

    However, I did mislead you a little here. I said that, if you want the data to be thread-safe immutable, it should not be changed after the final field gets set. It turns out that the immutable data can be changed after the final field is set. It just can't be changed after the constructor ends:
    class String {
    // Absolutely fine.
    private final byte[] bytes;
    public String(byte[] value) {
    bytes = new byte[value.length];
    System.arraycopy(value, 0,
    bytes, 0, value.length);
    }
    }
    You can think of the end of the constructor as being something like a "freeze" operation. The data reachable by the object's final fields must be frozen by the end of the constructor. When figuring out how this should work, we made this decision because it is a big pain in the neck to try to get everything done by the time you actually set the final field, and besides, the semantics were simpler this way.

  3. A reference to the object containing the final field did not escape the constructor

    "Escaping" the constructor simply means that when you were inside the constructor, you stored a reference to the object under construction somewhere where another thread could get at it.

    In some ways, this is kind of a nasty restriction. There are all sorts of classes that let a reference to the object escape the constructor. Swing has lots of examples of this (I think — I don't feel like looking it up). Here's an example where our String class tries to keep track of all of its instances:
    class String {
    // Don't do this.
    static Set<String> allStrings =
    Collections.synchronizedSet(new HashSet());
    private final byte[] bytes;
    public String(byte[] value) {
    allStrings.add(this);
    bytes = new byte[value.length];
    System.arraycopy(value, 0,
    bytes, 0, value.length);
    }
    }
    It's pretty clear that another thread could come along and read the object being constructed out of allStrings before it is done being constructed. So, there's no free ride there.

    But this one gets nastier, too! Let's take an example that looks correct at first blush, but isn't:
    class String {
    // Don't do this, either.
    static String lastConstructed;
    private final byte[] bytes;
    public String(byte[] value) {
    bytes = new byte[value.length];
    System.arraycopy(value, 0,
    bytes, 0, value.length);
    lastConstructed = this;
    }
    }
    It looks as if, when you read lastConstructed, it will always give you the last String instance that was constructed. Don't be fooled. Even if this does give you the last String instance that was constructed (and it may or may not, but that's besides the point), it is quite possible for the reader thread to see uninitialized data. The compiler can take the assignment to lastConstructed, and move it to before the assignment to bytes. So, you have to avoid allowing references to objects to escape constructors. (To preempt questions — yes, this example would work if you made lastConstructed volatile.).
A couple of more points, too:
  • The byte array has to be reachable from the final field by the end of the constructor. If I had bytes point to a container object, and then made the container object point to the array after the constructor finished, then I wouldn't be able to claim it was threadsafe.
    // The bytes reachable from bytesHolder *after* later()
    // is called are not thread-safe immutable.
    class String {
    static class BytesHolder {
    byte[] bytes;
    }
    final BytesHolder bytesHolder;
    public String() {
    bytesHolder = new BytesHolder();
    }
    void later(byte[] value) {
    byte[] newArray = new byte[value.length];
    System.arraycopy(value, 0,
    bytes, 0, value.length);
    bytesHolder.bytes = newArray;
    }
    }
    This, of course, violates the rule that the byte array is set before the constructor finishes — but even if you don't violate that rule, you don't get the thread-safety guarantee:
    // The bytes reachable from bytesHolder *after*
    // later() is called are still not thread-safe immutable.
    class String {
    static class BytesHolder {
    byte[] bytes;
    }
    final BytesHolder bytesHolder;
    byte[] bytes;
    public String(byte[] value) {
    bytesHolder = new BytesHolder();
    bytes = new byte[value.length];
    System.arraycopy(value, 0,
    bytes, 0, value.length);
    }
    void later(byte[] value) {
    bytesHolder.bytes = bytes;
    }
    }
  • The other point is that you can't just acquire these guarantees by waiting an inordinately long time. Here's another example of an instance escaping its constructor:
    class String {
    // Don't do this, either.
    static String firstInstance;
    private final byte[] bytes;
    public String(byte[] value) {
    bytes = new byte[value.length];
    System.arraycopy(value, 0,
    bytes, 0, value.length);
    if (firstInstance == null) {
    firstInstance = this;
    }
    }
    public static String getFirst() {
    String instance = firstInstance;
    if (instance == null) return null;
    // Wait a while for the constructor to end...
    try {
    Thread.sleep(1000)
    } catch (InterruptedException e) {}
    return instance;
    }
    }
    (Obviously, your code wouldn't look like this; this is just an example) The fact of the matter is that without using the guarantees of final fields or some form of explicit synchronization, there is no guarantee that a thread that reads the version that escapes the constructor will ever see the correct version. This can be accomplished in various ways, all of which have to do with sneaky compiler tricks.
There is, of course, far more to say about immutability. At some point, I should talk about its relationship with serialization and reflection. And best practices for making immutable objects when values aren't available for the constructor. And best practices for making immutable data types (like Sets and Maps). I think I've done enough typing for one evening, though. :)

If you liked this post, read the followup: Immutability in Java, Part 3.

12 comments:

Tom Hawtin said...

If lastConstructed was volatile but then the reference was unsafely published to another thread, then the String would not be immutable. Right?

Danny said...

Speaking of Unsafe, do you know where I can find more information about Sun's "Unsafe" class? Doug Lea seems to use this class a lot in his concurrency package, but there is little documentation on it.

pveentjer said...

Another tricky this escape from a constructor is calling instance methods and allowing a subclass to override the methods.

Jeremy Manson said...

If lastConstructed was volatile but then the reference was unsafely published to another thread, then the String would not be immutable. Right?

It wouldn't be thread-safe because it was immutable, but it would be thread-safe because lastConstructed was volatile.

Jeremy Manson said...

Speaking of Unsafe, do you know where I can find more information about Sun's "Unsafe" class? Doug Lea seems to use this class a lot in his concurrency package, but there is little documentation on it.

I should do a post on this at some point. The upshot is that the JDK is now available under the GPL, so you can just go look at the code (http://openjdk.java.net). You have to do one of two things to get it to work, though:

1) Stick the class that uses it in your bootclasspath, or

2) Use reflection to get the field that stores the unique Unsafe object:
Field field =   Unsafe.class.getDeclaredField("theUnsafe");
field.setAccessible(true);
return (Unsafe)field.get(null);

(Or something)

Jeremy Manson said...

Another tricky this escape from a constructor is calling instance methods and allowing a subclass to override the methods.

Yep! The nasty interactions of concurrency and inheritance is probably worth a chapter or two in a book.

Matt Drees said...

So, how about if a reference escapes the constructor, but in a (seemingly) controlled way:

final class Superhero {
  final Sidekick sidekick;

  Superhero() {
    sidekick = new Sidekick(this);
  }

  public Sidekick getSidekick() {...}

  public static final class Sidekick {
    final Superhero hero;

    Sidekick(Superhero hero) {
      this.hero = hero;
    }

    public Superhero getHero() {...}
  }
}

Can Superhero be considered immutable? If not, is it at least thread-safe?

Jeremy Manson said...

@Matt - this is okay. This is also pretty much how everyone implements instance inner classes - they have a final reference to the outer class instance.

javarevisited said...

Good article.
It would have been good if you include some example from standard java library.

e.g. String is immutable

Amit Verma said...

I have read somewhere that for final fields compilers are not required to synchronize their cache with the main memory as it is required for the normal fields.But if data of the final field can be changed as done in method setZerothElt() of one of your example, then how java compilers ensure the visibility of data of final fields among different threads?

Amit Verma said...

If in a class A we have a final field b, which refers to class B and Class B contains non-final fields.
In this case does JMM guarantees that object A is constructed completely and is visible to all the threds after the constructor of A is completed because there might be a case that field b of class A contains not fully initialized object of class B.

Jeremy Manson said...

@Amit - in response to your first question: The use of setZerothElement results in a violation of immutability. The updates are not guaranteed to be seen by other threads.

In response to your second question: Other threads are guaranteed to see the fields of b at least as up to date as they were when the constructor for a finished. If changes occur after that, the other threads may or may not see them.