I'd like to talk a little more about what it takes to ensure thread-safe immutability in Java, following on from a (semi)recent post I made on the subject.
The basic gist of that post was that if you make data immutable, then they can be shared between threads without additional synchronization. I call this "thread-safe immutability". In that post, I said this:
Now, in common parlance, immutability means "does not change". Immutability doesn't mean "does not change" in Java. It means "is transitively reachable from a final field, has not changed since the final field was set, and a reference to the object containing the final field did not escape the constructor".
I just wanted to go over these points again, because I get a lot of questions about them and there are a couple of things I glossed over. Let's take them one by one. Immutability means...
If you liked this post, read the followup: Immutability in Java, Part 3.
The basic gist of that post was that if you make data immutable, then they can be shared between threads without additional synchronization. I call this "thread-safe immutability". In that post, I said this:
Now, in common parlance, immutability means "does not change". Immutability doesn't mean "does not change" in Java. It means "is transitively reachable from a final field, has not changed since the final field was set, and a reference to the object containing the final field did not escape the constructor".
I just wanted to go over these points again, because I get a lot of questions about them and there are a couple of things I glossed over. Let's take them one by one. Immutability means...
- the object is transitively reachable from a final field
What this means is that you can reach the object or value by following references from the final field. Let's say we're making an immutable String class, like the one in the Java libraries. We might have something like this (notice I'm not checking for null pointers in the example code. Go ahead, make me care.):class String {
The thing we are storing in our String is a final reference to the byte array, not the array itself. The contents of the byte array can be changed, according to the semantics of Java, but what is important, for the purposes of thread-safety, is that they are reachable by following a reference from the final field.
private final byte[] bytes;
public String(byte[] value) {
byte[] newArray = new byte[value.length];
System.arraycopy(value, 0, newArray,
0, value.length);
bytes = newArray;
}
}
ETA: So, apparently, I wasn't clear enough on this. Let's try again.
Transitively reachable means that you can get from the final field to the data you want to be immutable by following zero or more references. So, if you had:class String {
The array of bytes is reachable from the bytes field. If you have:
final byte[] bytes;
}class String {
The array of bytes is still transitively reachable from the bytes field. In fact, you can add as many layers of indirection as you want — as long as you can get there by following references from the field, it is still transitively reachable from the field.
static class BytesHolder {
byte[] bytes;
}
final BytesHolder holder;
} - the object has not changed since the final field was set
This one is simple, right? Here's the String class again, but the bytes are no longer immutable:class String {
Well, now, obviously that's not thread-safe immutable. If two threads call setZerothElt(), then all sorts of bad thread horseplay can happen.
private final byte[] bytes;
public String(byte[] value) {
byte[] newArray = new byte[value.length];
System.arraycopy(value, 0, newArray,
0, value.length);
bytes = newArray;
}
public setZerothElt(byte b) {
byte[0] = b;
}
}
However, I did mislead you a little here. I said that, if you want the data to be thread-safe immutable, it should not be changed after the final field gets set. It turns out that the immutable data can be changed after the final field is set. It just can't be changed after the constructor ends:class String {
You can think of the end of the constructor as being something like a "freeze" operation. The data reachable by the object's final fields must be frozen by the end of the constructor. When figuring out how this should work, we made this decision because it is a big pain in the neck to try to get everything done by the time you actually set the final field, and besides, the semantics were simpler this way.
// Absolutely fine.
private final byte[] bytes;
public String(byte[] value) {
bytes = new byte[value.length];
System.arraycopy(value, 0,
bytes, 0, value.length);
}
} - A reference to the object containing the final field did not escape the constructor
"Escaping" the constructor simply means that when you were inside the constructor, you stored a reference to the object under construction somewhere where another thread could get at it.
In some ways, this is kind of a nasty restriction. There are all sorts of classes that let a reference to the object escape the constructor. Swing has lots of examples of this (I think — I don't feel like looking it up). Here's an example where our String class tries to keep track of all of its instances:class String {
It's pretty clear that another thread could come along and read the object being constructed out of allStrings before it is done being constructed. So, there's no free ride there.
// Don't do this.
static Set<String> allStrings =
Collections.synchronizedSet(new HashSet());
private final byte[] bytes;
public String(byte[] value) {
allStrings.add(this);
bytes = new byte[value.length];
System.arraycopy(value, 0,
bytes, 0, value.length);
}
}
But this one gets nastier, too! Let's take an example that looks correct at first blush, but isn't:class String {
It looks as if, when you read lastConstructed, it will always give you the last String instance that was constructed. Don't be fooled. Even if this does give you the last String instance that was constructed (and it may or may not, but that's besides the point), it is quite possible for the reader thread to see uninitialized data. The compiler can take the assignment to lastConstructed, and move it to before the assignment to bytes. So, you have to avoid allowing references to objects to escape constructors. (To preempt questions — yes, this example would work if you made lastConstructed volatile.).
// Don't do this, either.
static String lastConstructed;
private final byte[] bytes;
public String(byte[] value) {
bytes = new byte[value.length];
System.arraycopy(value, 0,
bytes, 0, value.length);
lastConstructed = this;
}
}
- The byte array has to be reachable from the final field by the end of the constructor. If I had bytes point to a container object, and then made the container object point to the array after the constructor finished, then I wouldn't be able to claim it was threadsafe.
// The bytes reachable from bytesHolder *after* later()
This, of course, violates the rule that the byte array is set before the constructor finishes — but even if you don't violate that rule, you don't get the thread-safety guarantee:
// is called are not thread-safe immutable.
class String {
static class BytesHolder {
byte[] bytes;
}
final BytesHolder bytesHolder;
public String() {
bytesHolder = new BytesHolder();
}
void later(byte[] value) {
byte[] newArray = new byte[value.length];
System.arraycopy(value, 0,
bytes, 0, value.length);
bytesHolder.bytes = newArray;
}
}// The bytes reachable from bytesHolder *after*
// later() is called are still not thread-safe immutable.
class String {
static class BytesHolder {
byte[] bytes;
}
final BytesHolder bytesHolder;
byte[] bytes;
public String(byte[] value) {
bytesHolder = new BytesHolder();
bytes = new byte[value.length];
System.arraycopy(value, 0,
bytes, 0, value.length);
}
void later(byte[] value) {
bytesHolder.bytes = bytes;
}
} - The other point is that you can't just acquire these guarantees by waiting an inordinately long time. Here's another example of an instance escaping its constructor:
class String {
(Obviously, your code wouldn't look like this; this is just an example) The fact of the matter is that without using the guarantees of final fields or some form of explicit synchronization, there is no guarantee that a thread that reads the version that escapes the constructor will ever see the correct version. This can be accomplished in various ways, all of which have to do with sneaky compiler tricks.
// Don't do this, either.
static String firstInstance;
private final byte[] bytes;
public String(byte[] value) {
bytes = new byte[value.length];
System.arraycopy(value, 0,
bytes, 0, value.length);
if (firstInstance == null) {
firstInstance = this;
}
}
public static String getFirst() {
String instance = firstInstance;
if (instance == null) return null;
// Wait a while for the constructor to end...
try {
Thread.sleep(1000)
} catch (InterruptedException e) {}
return instance;
}
}
If you liked this post, read the followup: Immutability in Java, Part 3.
Comments
It wouldn't be thread-safe because it was immutable, but it would be thread-safe because lastConstructed was volatile.
I should do a post on this at some point. The upshot is that the JDK is now available under the GPL, so you can just go look at the code (http://openjdk.java.net). You have to do one of two things to get it to work, though:
1) Stick the class that uses it in your bootclasspath, or
2) Use reflection to get the field that stores the unique Unsafe object:
Field field = Unsafe.class.getDeclaredField("theUnsafe");
field.setAccessible(true);
return (Unsafe)field.get(null);
(Or something)
Yep! The nasty interactions of concurrency and inheritance is probably worth a chapter or two in a book.
final class Superhero {
final Sidekick sidekick;
Superhero() {
sidekick = new Sidekick(this);
}
public Sidekick getSidekick() {...}
public static final class Sidekick {
final Superhero hero;
Sidekick(Superhero hero) {
this.hero = hero;
}
public Superhero getHero() {...}
}
}
Can Superhero be considered immutable? If not, is it at least thread-safe?
It would have been good if you include some example from standard java library.
e.g. String is immutable
In this case does JMM guarantees that object A is constructed completely and is visible to all the threds after the constructor of A is completed because there might be a case that field b of class A contains not fully initialized object of class B.
In response to your second question: Other threads are guaranteed to see the fields of b at least as up to date as they were when the constructor for a finished. If changes occur after that, the other threads may or may not see them.
If you have an
int size;
field in there, along with the final bytes[], and that field is initialized in the constructor, is the object still properly immutable, even if size is not final?
What if the object consists only of primitive fields? Does at least one field need to be final to get the immutable guarantee, or do they all need to be final?
In practice, implementations provide the guarantee if any fields are final, but this can bite you in two ways:
1) The spec doesn't guarantee that, so an implementation could change *not* to do that.
2) Code evolution could result in removal of the final field, which would then result in mysterious error cropping up. This has happened to people from time to time.
We're looking at changing the spec so that you always get some initialization safety guarantee, but for the moment, keep all of your fields final if you want it.
class String {
private final byte[] bytes;
public String(byte[] value) {
bytes = new byte[value.length];
System.arraycopy(value, 0, bytes, 0, value.length);
}
}
From my readings on JSR133 transitive guarantees and JLS freeze before constructor exits than this should be safe also. Or should we write the more decent initialization, expecting JMM changes?
(1). a final reference guarantees that not only it refers to the correct object (i.e, points to the correct address allocated for the object), but also the referred object is correctly initialized like volatile?
(2). for any variable v (including normal, final and volatile ones), a thread t will access main memory the first time it accesses v? If so, final is used to make sure that correct value for v is written back to main memory like a volatile-write, then a later read of v by any thread can be safely cached?
(3). We should try to make every object reachable from a final field f not modified after f is set. If a reachable object need to be modified in an execution, we need to use synchronization on it.
(4) Even if any object o reachable from a final filed f is not modified after f is set, we should access it via f. If there is another way w to access o but bypassing f, it is possible that o already exists before setting f (For instance, passing o to a constructor of some class during setting f), and a thread via w cached o before f is set, and o was modified after that cache but before setting f?
ps: chapter 17 of JSL is too abstract for me to understand JMM, especially the casuality requirements and final semantics. Could you give me some addtional materials?
Nope, just that the final fields of the referred object are fully initialized.
(2). for any variable v (including normal, final and volatile ones), a thread t will access main memory the first time it accesses v? If so, final is used to make sure that correct value for v is written back to main memory like a volatile-write, then a later read of v by any thread can be safely cached?
There is no guarantee about "main memory" made in the model. References to a variable may never touch main memory, if, for example, the compiler decides the variable can be stored in a register. Typically, the implementation of final for objects that need it relies on a memory fence executed at the end of a constructor.
(3). We should try to make every object reachable from a final field f not modified after f is set. If a reachable object need to be modified in an execution, we need to use synchronization on it.
You can modify anything reachable up until the end of the constructor, at which point everything is conceptually "frozen".
(4) Even if any object o reachable from a final filed f is not modified after f is set, we should access it via f. If there is another way w to access o but bypassing f, it is possible that o already exists before setting f (For instance, passing o to a constructor of some class during setting f), and a thread via w cached o before f is set, and o was modified after that cache but before setting f?
Yes.
ps: chapter 17 of JSL is too abstract for me to understand JMM, especially the casuality requirements and final semantics. Could you give me some addtional materials?
I made a blog post here:
http://jeremymanson.blogspot.com/2007/08/causality-and-java-memory-model.html