Another topic that comes up again and again in questions that people ask me is, "How does immutability work in Java"? Immutability is a godsend for concurrent programmers, because you don't have to do lots of sticky reasoning about what threads are updating what variables when, and you don't have to worry about cache thrashing, and you don't have to worry about all sorts of things. When I write concurrent code (which is reasonably often), I try to make as many things immutable as possible.
Now, in common parlance, immutability means "does not change". Immutability doesn't mean "does not change" in Java. It means "is transitively reachable from a final field, has not changed since the final field was set, and a reference to the object containing the final field did not escape the constructor".
In circumstances other than this, even if a given field is not mutated, the Java memory model requires that there be some form of synchronization (which can include the use of volatile, static initialization, synchronized blocks, any of the java.util.concurrent collections, or the use of a java.util.concurrent.atomic.AtomicFoo object) for a thread to make sure that it sees the correctly constructed object for the first time. Subsequent reads of the object by any given thread don't require additional synchronization.
So, a correctly written version of HashMap that was immutable and thread-safe would look like this:
ETA: This is how Collections.unmodifiableMap() works.
Because of the special meaning of the keyword "final", instances of this class can be shared with multiple threads without using any additional synchronization; when another thread calls get() on the instance, it is guaranteed to get the object you put into the map, without doing any additional synchronization. You should probably use something that is thread-safe to perform the handoff between threads (like LinkedBlockingQueue or something), but if you forget to do this, then you still have the guarantee.
There are two major points to make about this kind of "immutability":
So, the moral of the story is:
That's all I wanted to say.
If you liked this post, read the followup: Immutability in Java, Part 2.
Now, in common parlance, immutability means "does not change". Immutability doesn't mean "does not change" in Java. It means "is transitively reachable from a final field, has not changed since the final field was set, and a reference to the object containing the final field did not escape the constructor".
In circumstances other than this, even if a given field is not mutated, the Java memory model requires that there be some form of synchronization (which can include the use of volatile, static initialization, synchronized blocks, any of the java.util.concurrent collections, or the use of a java.util.concurrent.atomic.AtomicFoo object) for a thread to make sure that it sees the correctly constructed object for the first time. Subsequent reads of the object by any given thread don't require additional synchronization.
So, a correctly written version of HashMap that was immutable and thread-safe would look like this:
public class ImmutableHashMap<K, V> implements Map<K, V> {
private final Map<K, V> map;
public ImmutableHashMap(Map<K, V> map) {
this.map = new HashMap<K, V>(map);
}
@Override
public V get(Object key) {
// And similarly all other accessors
return map.get(key);
}
@Override
public V put(K key, V value) {
// And similarly all other mutators
throw new UnsupportedOperationException();
}
}
ETA: This is how Collections.unmodifiableMap() works.
Because of the special meaning of the keyword "final", instances of this class can be shared with multiple threads without using any additional synchronization; when another thread calls get() on the instance, it is guaranteed to get the object you put into the map, without doing any additional synchronization. You should probably use something that is thread-safe to perform the handoff between threads (like LinkedBlockingQueue or something), but if you forget to do this, then you still have the guarantee.
There are two major points to make about this kind of "immutability":
- It's not immutable. So, I've completely misled you with mutable immutability. The following code is perfectly legal:
HashMap<Integer, StringBuilder> map =
new HashMap<Integer, StringBuilder>();
StringBuilder builder = new StringBuilder();
builder.append("foo");
map.put(1, builder);
ImmutableHashMap<Integer, StringBuilder> immutableMap =
new ImmutableHashMap<Integer, StringBuilder>(map);
builder.append("bar");
System.out.println(immutableMap.get(1));
I think we all know that that println() method is printing "foobar", not "foo". So, even if we call this an "immutable" hash map, values (and keys!) can still be mutated. This is a bad idea, of course. Other threads are not guaranteed to see the updates you make to immutable objects (at least, not without additional synchronization). - The final field is absolutely necessary for the thread-safety guarantee. I recently saw an implementation of an ImmutableHashMap that looked more like this:
This has the great virtue of avoiding the extra indirection of the delegation-based version, and also has the great virtue of being shorter (because you don't have to rewrite all of the accessors). The flip side is that if you share instances of this ImmutableHashMap with other threads, then you absolutely have to use synchronization, because it does not get the special guarantees that the final field provides. If you call get(), you can actually get the wrong value out. It isn't likely to happen in practice right now, but compiler writers are allowed to take advantage of this.
public class ImmutableHashMap<K, V> extends HashMap<K, V> {
public ImmutableHashMap(Map<K, V> map) {
super(map);
}
@Override
public V put(K key, V value) {
// And similarly all other mutators
throw new UnsupportedOperationException();
}
}
So, the moral of the story is:
- Use final fields whenever you can, and
- Immutability is a funny thing.
That's all I wanted to say.
If you liked this post, read the followup: Immutability in Java, Part 2.
Comments
java.util.Collections.unmodifiable...()
be considered immutable provided the underlying collection is never updated after its wrapping?
unmodifiableMap(new HashMap(map))
The original an be change safely.
Its worth noting that final variable can have their contents changes, and final variables can be changed using reflection.
unmodifiableMap(new HashMap(map))
The original an be change safely.
That's if you are worried about changing the mappings, not the actual keys and values. My point was that unmodifiableMap doesn't make a copy of the keys and values of the map, it just delegates to a newly constructed internal map that copies the mappings. Its semantics, in fact, are identical to the interface-based delegation version I put above. So you still have exactly the same problem -- you can modify the keys and values themselves, you just can't modify the mappings. The thing I did with StringBuilder up there still works.
Its worth noting that final variable can have their contents changes, and final variables can be changed using reflection.
That's true, and many don't realize it; you need to use the setAccessible() method on java.lang.reflect.Field to do it.
Why does java allow you to change final variables via reflection? It's a nice feature, but in principle, shouldn't this always be disallowed?
Why does java allow you to change final variables via reflection? It's a nice feature, but in principle, shouldn't this always be disallowed?
Actually, it is worse than that. If you have two programs:
class A {
static final String s = "foo";
}
class B {
public static void main(String [] args) {
System.err.println(A.s);
}
}
then you compile them both, and then you change A.s to be "bar", and then you recompile A but not B, running B will still give you "foo". And the spec says that's what is supposed to happen. So the semantics of static final fields are pretty messed up.
The reason that final fields are modifiable through reflection is so that people can implement their own deserialization mechanisms without having to add special constructors to every class. It is just another way in which serialization semantics are screwed up.
The final modifier prevents compiler reorderings that can affect thread safety. For example, if you have:
class Foo {
final int x;
Foo() {
x = 1;
}
}
Thread 1:
o = new Foo();
Thread 2:
if (o != null) {
r1 = o.f
}
Without the final modifier, the compiler and JVM are allowed to move the write to x so that it occurs after the reference to the new object is written to o. If these two threads executed concurrently, then Thread 2 could see a non-null value for o and read o.f without seeing 1 (because it reads the reference before the field x is written). If you mark x final, this cannot happen.
In the case of Collections.unsynchronizedMap(), you get the same sort of special guarantee, because the map is marked final. In the case of the hypothetical ImmutableHashMap, you do not get this guarantee, because there are no final fields to provide it.
I hope that makes it a little clearer.
Correct. Sorry about that.
Checkout the hashcode() method for String which lazily evaluates the hash field on first call. So technically the state of the instance changes on the first call to hashcode().
The real question is what do we mean by immutable? Is it enough for an object to not be 'observed' to have changed, which seems to be the rule we need to apply to bring String back into the fold. If we use this latter definition then String is immutable.
final AtomicReference globalRef = new AtomicReference(); // This instance is accessible to all the threads.
Then each thread will do something like this.
Object obj = globalRef.get();
if(obj == null){
obj = new Object(); //This would be some sharable object
if(!globalRef.compareAndSet(null, obj)){
obj = null; //Do some clean up here
obj = globalRef.get();
}
}
//from this point onwards use obj.
Now all the thread would share same object. Since the reference is only set once and would never be reset. Each gloablRef.get() would be an access to the volatile variable. I proposed to write a class which could be little bit more efficient then AtmoicReference. Below is the code snippet
public class SetOnceReference {
private final AtomicBoolean m_isSet = new AtomicBoolean(false);
private T m_ref;
private boolean m_unSafeFlag;
volatile private boolean m_safeFlag;
public boolean set(T t){
if(m_isSet.compareAndSet(false, true)){
m_ref = t;
m_safeFlag = true;
return true;
}
return false;
}
public T get(){
if(m_unSafeFlag) return m_ref;
if(m_safeFlag){
m_unSafeFlag = true;
return m_ref;
}
return null;
}
}
A colleague of mine is saying it would not work. As it could happen that one thread would call get and read m_unsafeFlag == true and then read m_ref and no read memory barrier is invoked. To read an nonvolatile variable without invoking a read memory barrier is not correct. I am saying that the fact that m_unsafeFlag == true means that read memory barrier is invoked and as long as it is invoked by some thread every thread should not have to invoke it and happens-before rule prevent compiler from re ordering m_unSafeFlag = true.
Do you think this implementation is correct.
so effectively some objects are considered immutable by JVM even though they are strictly speaking mutable.
In this case JVM guarantees safe publishing, but doesn't guarantee visibility of changes made afterwards.
To make such objects really immutable, we need to add another rule that their observable state never changes.
Do I understand it correct?
one more question to clarify. You say "without using any additional synchronization"...
But we still have to use volatile or some other mechanism for publishing a reference to ImmutableHashMap object, right?
For a String, this is rarely a problem. For other classes, it might be a problem, and you will want to use volatile, locking, and / or other mechanisms to ensure safe publication of the reference to the object itself.
For a String, this is rarely a problem. For other classes, it might be a problem, and you will want to use volatile, locking, and / or other mechanisms to ensure safe publication of the reference to the object itself.
So final is mainly for post construction immutability, and its usage for correctly construction can be replaced by volatile. Is my understanding correct?
Could you explain more about this aspect of immutability? Thanks.
It depends a bit on the actual code, but that's the general idea.
I did a series of posts on immutability in 2008:
https://jeremymanson.blogspot.com/2008/04/immutability-in-java.html
https://jeremymanson.blogspot.com/2008/07/immutability-in-java-part-2.html
https://jeremymanson.blogspot.com/2008/07/immutability-in-java-part-3.html
LMK if you don't find the answers there.
For example, there used to be a string implementation with an offset, length, and array field. You could have code like this:
String s1 = "/usr/tmp";
String s2 = s1.substring(4);
If you then shared s2 with a different thread without publishing it with synchronization and with no final field safety checks, you might see a 0 for the offset and a 4 for the length, giving you /usr instead of /tmp. This could be bad if you were, for example, using that string to give you some permission on the file system.
With the final field guarantees, there are no guarantees that you will see the correct object, but there are guarantees that you won't see a partially constructed object. Even without synchronization, the implementation will never let you read /usr from that string. Using synchronization gives you the additional guarantee that you will see the most up to date reference published to a String.