Friday, April 25, 2008

Immutability in Java

Another topic that comes up again and again in questions that people ask me is, "How does immutability work in Java"? Immutability is a godsend for concurrent programmers, because you don't have to do lots of sticky reasoning about what threads are updating what variables when, and you don't have to worry about cache thrashing, and you don't have to worry about all sorts of things. When I write concurrent code (which is reasonably often), I try to make as many things immutable as possible.

Now, in common parlance, immutability means "does not change". Immutability doesn't mean "does not change" in Java. It means "is transitively reachable from a final field, has not changed since the final field was set, and a reference to the object containing the final field did not escape the constructor".

In circumstances other than this, even if a given field is not mutated, the Java memory model requires that there be some form of synchronization (which can include the use of volatile, static initialization, synchronized blocks, any of the java.util.concurrent collections, or the use of a java.util.concurrent.atomic.AtomicFoo object) for a thread to make sure that it sees the correctly constructed object for the first time. Subsequent reads of the object by any given thread don't require additional synchronization.

So, a correctly written version of HashMap that was immutable and thread-safe would look like this:

public class ImmutableHashMap<K, V> implements Map<K, V> {
private final Map<K, V> map;
public ImmutableHashMap(Map<K, V> map) {
this.map = new HashMap<K, V>(map);
}

@Override
public V get(Object key) {
// And similarly all other accessors
return map.get(key);
}

@Override
public V put(K key, V value) {
// And similarly all other mutators
throw new UnsupportedOperationException();
}
}

ETA: This is how Collections.unmodifiableMap() works.

Because of the special meaning of the keyword "final", instances of this class can be shared with multiple threads without using any additional synchronization; when another thread calls get() on the instance, it is guaranteed to get the object you put into the map, without doing any additional synchronization. You should probably use something that is thread-safe to perform the handoff between threads (like LinkedBlockingQueue or something), but if you forget to do this, then you still have the guarantee.

There are two major points to make about this kind of "immutability":
  1. It's not immutable. So, I've completely misled you with mutable immutability. The following code is perfectly legal:

    HashMap<Integer, StringBuilder> map =
    new HashMap<Integer, StringBuilder>();
    StringBuilder builder = new StringBuilder();
    builder.append("foo");
    map.put(1, builder);
    ImmutableHashMap<Integer, StringBuilder> immutableMap =
    new ImmutableHashMap<Integer, StringBuilder>(map);
    builder.append("bar");
    System.out.println(immutableMap.get(1));

    I think we all know that that println() method is printing "foobar", not "foo". So, even if we call this an "immutable" hash map, values (and keys!) can still be mutated. This is a bad idea, of course. Other threads are not guaranteed to see the updates you make to immutable objects (at least, not without additional synchronization).

  2. The final field is absolutely necessary for the thread-safety guarantee. I recently saw an implementation of an ImmutableHashMap that looked more like this:

    public class ImmutableHashMap<K, V> extends HashMap<K, V> {
    public ImmutableHashMap(Map<K, V> map) {
    super(map);
    }

    @Override
    public V put(K key, V value) {
    // And similarly all other mutators
    throw new UnsupportedOperationException();
    }
    }
    This has the great virtue of avoiding the extra indirection of the delegation-based version, and also has the great virtue of being shorter (because you don't have to rewrite all of the accessors). The flip side is that if you share instances of this ImmutableHashMap with other threads, then you absolutely have to use synchronization, because it does not get the special guarantees that the final field provides. If you call get(), you can actually get the wrong value out. It isn't likely to happen in practice right now, but compiler writers are allowed to take advantage of this.

So, the moral of the story is:
  • Use final fields whenever you can, and

  • Immutability is a funny thing.

That's all I wanted to say.

If you liked this post, read the followup: Immutability in Java, Part 2.

14 comments:

Anonymous said...

Can the collections returned by
java.util.Collections.unmodifiable...()
be considered immutable provided the underlying collection is never updated after its wrapping?

Jeremy Manson said...

That's such an important point, and it was so stupid that I forgot it, that I put it in the text. Collections.unmodifiableBlah() uses the delegation model.

Peter Lawrey said...

If you are worried about changing the original you can take a copy like.

unmodifiableMap(new HashMap(map))

The original an be change safely.

Its worth noting that final variable can have their contents changes, and final variables can be changed using reflection.

Jeremy Manson said...

If you are worried about changing the original you can take a copy like.

unmodifiableMap(new HashMap(map))

The original an be change safely.


That's if you are worried about changing the mappings, not the actual keys and values. My point was that unmodifiableMap doesn't make a copy of the keys and values of the map, it just delegates to a newly constructed internal map that copies the mappings. Its semantics, in fact, are identical to the interface-based delegation version I put above. So you still have exactly the same problem -- you can modify the keys and values themselves, you just can't modify the mappings. The thing I did with StringBuilder up there still works.

Its worth noting that final variable can have their contents changes, and final variables can be changed using reflection.

That's true, and many don't realize it; you need to use the setAccessible() method on java.lang.reflect.Field to do it.

Danny said...

It's probably also worth noting that not all final variables can be changed. The compilier may decide to inline your final "static" primitive or Strings for you. In this case, you won't be able to change the value.
Why does java allow you to change final variables via reflection? It's a nice feature, but in principle, shouldn't this always be disallowed?

Jeremy Manson said...

It's probably also worth noting that not all final variables can be changed. The compilier may decide to inline your final "static" primitive or Strings for you. In this case, you won't be able to change the value.
Why does java allow you to change final variables via reflection? It's a nice feature, but in principle, shouldn't this always be disallowed?


Actually, it is worse than that. If you have two programs:

class A {
  static final String s = "foo";
}

class B {
  public static void main(String [] args) {
    System.err.println(A.s);
  }
}

then you compile them both, and then you change A.s to be "bar", and then you recompile A but not B, running B will still give you "foo". And the spec says that's what is supposed to happen. So the semantics of static final fields are pretty messed up.

The reason that final fields are modifiable through reflection is so that people can implement their own deserialization mechanisms without having to add special constructors to every class. It is just another way in which serialization semantics are screwed up.

Anonymous said...

I don't see how "final" makes a difference threat-safety-wise. You'll always get an exception if u try to change the internal state. "final" keyword only prevents u to change the reference of the internal map of the first example, so it is a precaution mechanism for programmers. As for the Collections.unmodifiable, imho, it adheres the first approach because that does not dictates a specific implementation (in our case HashMap) of map, list, etc. So this is how i see it, pls correct me if i am wrong.

Jeremy Manson said...

I don't see how "final" makes a difference threat-safety-wise. You'll always get an exception if u try to change the internal state. "final" keyword only prevents u to change the reference of the internal map of the first example, so it is a precaution mechanism for programmers. As for the Collections.unmodifiable, imho, it adheres the first approach because that does not dictates a specific implementation (in our case HashMap) of map, list, etc. So this is how i see it, pls correct me if i am wrong.

The final modifier prevents compiler reorderings that can affect thread safety. For example, if you have:

class Foo {
  final int x;
  Foo() {
    x = 1;
  }
}

Thread 1:
o = new Foo();

Thread 2:
if (o != null) {
  r1 = o.f
}

Without the final modifier, the compiler and JVM are allowed to move the write to x so that it occurs after the reference to the new object is written to o. If these two threads executed concurrently, then Thread 2 could see a non-null value for o and read o.f without seeing 1 (because it reads the reference before the field x is written). If you mark x final, this cannot happen.

In the case of Collections.unsynchronizedMap(), you get the same sort of special guarantee, because the map is marked final. In the case of the hypothetical ImmutableHashMap, you do not get this guarantee, because there are no final fields to provide it.

I hope that makes it a little clearer.

Jonathan said...

A small fix: o.x, not o.f.

Jeremy Manson said...

A small fix: o.x, not o.f.

Correct. Sorry about that.

Gary Frost said...

I think that the definition of immutable needs reviewing. String for example is often used as an example of an Immutable object, yet it's state does indeed change post contsruction.

Checkout the hashcode() method for String which lazily evaluates the hash field on first call. So technically the state of the instance changes on the first call to hashcode().

The real question is what do we mean by immutable? Is it enough for an object to not be 'observed' to have changed, which seems to be the rule we need to apply to bring String back into the fold. If we use this latter definition then String is immutable.

Jeremy Manson said...

@gary -- you should check out the rest of the posts I made on this topic. I deal with that particular issue in this post: http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html

Anonymous said...

We have situation where multiple thread would like to get a reference and if it is null then the thread would create one. But there should only be one instance created and shared by all the threads. We can use AtomicReference for that an do something like this.

final AtomicReference globalRef = new AtomicReference(); // This instance is accessible to all the threads.


Then each thread will do something like this.


Object obj = globalRef.get();
if(obj == null){
obj = new Object(); //This would be some sharable object
if(!globalRef.compareAndSet(null, obj)){
obj = null; //Do some clean up here
obj = globalRef.get();
}
}


//from this point onwards use obj.

Now all the thread would share same object. Since the reference is only set once and would never be reset. Each gloablRef.get() would be an access to the volatile variable. I proposed to write a class which could be little bit more efficient then AtmoicReference. Below is the code snippet


public class SetOnceReference {

private final AtomicBoolean m_isSet = new AtomicBoolean(false);
private T m_ref;
private boolean m_unSafeFlag;
volatile private boolean m_safeFlag;

public boolean set(T t){
if(m_isSet.compareAndSet(false, true)){
m_ref = t;
m_safeFlag = true;
return true;
}
return false;
}

public T get(){

if(m_unSafeFlag) return m_ref;

if(m_safeFlag){
m_unSafeFlag = true;
return m_ref;
}

return null;
}
}

A colleague of mine is saying it would not work. As it could happen that one thread would call get and read m_unsafeFlag == true and then read m_ref and no read memory barrier is invoked. To read an nonvolatile variable without invoking a read memory barrier is not correct. I am saying that the fact that m_unsafeFlag == true means that read memory barrier is invoked and as long as it is invoked by some thread every thread should not have to invoke it and happens-before rule prevent compiler from re ordering m_unSafeFlag = true.

Do you think this implementation is correct.

Jeremy Manson said...

@anonymous - This implementation is incorrect. Each thread has to do explicit synchronization to be guaranteed to see the correctly constructed object. However, some threads may only read m_unsafeFlag, which does no synchronization.