I was asked a question about benign data races in Java this week, so I thought I would take the opportunity to discuss one of the (only) approved patterns for benign races. So, at the risk of encouraging bad behavior (don't use data races in your code!), I will discuss the canonical example of "benign races for performance improvement". Also, I'll put in another plug for Josh Bloch's new revision of Effective Java (lgt amazon), which I continue to recommend.
As a reminder, basically, a data race is when you have one (or more) writes, and potentially some reads; they are all to the same memory location; they can happen at the same time; and that there is nothing in the program to prevent it. This is different from a race condition, which is when you just don't know the order in which two actions are going to occur. I've put more discussion of what a data race actually is at the bottom of this post.
A lot of people think that it is okay to have a data races all over your code. By and large, it is a lot more dangerous than those people think it is. For example, the version of double checked locking that doesn't use volatile has a data race in it. As I've discussed before, that version is broken.
So, having said that, when can you have a harmless data race in your code? Here's some code from Sun's implementation of
Do you see the potential data race? Here's some code that triggers it:
There are basically two parts to the answer, and they are both pretty simple:
Now, let's break the code:
What I've done here is to add an additional read: the second read of hash, before the return. As odd as it sounds, and as unlikely as it is to happen, the first read can return the correctly computed hash value, and the second read can return 0! This is allowed under the memory model because the model allows extensive reordering of operations. The second read can actually be moved, in your code, so that your processor does it before the first!
(Side note: the first version is better anyway, because it performs fewer memory reads. If you really care that much about performance, that's the way to go.)
What's the moral here? The moral is that this stuff is very confusing, and you have to work very hard to get it right. Make sure you understand exactly what your code does. Avoid data races wherever possible.
Here are the details on data races that I promised above:
A program has a data race in it when it is possible to have an execution of that program where:
I defined "happens-before ordering" in my last post. Basically, it means that there is nothing establishing an order between the two actions; for example, there is no locking protecting the accesses, no volatile writes and reads intervening and no thread starts and joins.
As a reminder, basically, a data race is when you have one (or more) writes, and potentially some reads; they are all to the same memory location; they can happen at the same time; and that there is nothing in the program to prevent it. This is different from a race condition, which is when you just don't know the order in which two actions are going to occur. I've put more discussion of what a data race actually is at the bottom of this post.
A lot of people think that it is okay to have a data races all over your code. By and large, it is a lot more dangerous than those people think it is. For example, the version of double checked locking that doesn't use volatile has a data race in it. As I've discussed before, that version is broken.
So, having said that, when can you have a harmless data race in your code? Here's some code from Sun's implementation of
java.lang.String
, which is available under the GPL:
public final class String {
/** The value is used for character storage. */
private final char value[];
/** The offset is the first index of the storage that is used. */
private final int offset;
/** The count is the number of characters in the String. */
private final int count;
/** Cache the hash code for the string */
private int hash; // Default to 0
public int hashCode() {
int h = hash;
if (h == 0) {
int off = offset;
char val[] = value;
int len = count;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}
// ...
}
The principle here is pretty simple. The program calls hashCode()
. If it hasn't calculated the hash code yet, then it does, and then stores the value for future use. The people who wrote String
didn't bother to calculate the hash code in the constructor, because that would make construction (which is done all the time) more expensive. Because Strings are immutable, we don't need to worry about the hash code potentially changing in the future.Do you see the potential data race? Here's some code that triggers it:
class BenignDataRace {
final String s = "blah";
public void main() {
class CalcHash implements Runnable {
public void run() {
int h = s.hashCode();
}
}
Thread t = new Thread(new CalcHash());
Thread t2 = new Thread(new CalcHash());
t.start();
t2.start();
}
}
Two threads access s.hash
without doing synchronization, and at least one of those threads is going to write to it. Item 71 in Effective Java calls this idiom racy single-check. In this case, though, it actually works. Why is this safe, when the double-checked locking example, which uses synchronization, isn't?There are basically two parts to the answer, and they are both pretty simple:
- In this case, we don't care how many times we actually calculate the answer and assign it to the
hash
field. It just isn't a big deal if we happen to calculate it multiple times. Furthermore, - Java has a special guarantee for fields that are 32-bit or fewer and object references — a thread will always see some value that was assigned to that field. As a result of this, a thread reading
hash
will either see 0 or the calculated value. Edited to add: I just noticed that this was misleading. An object reference will point to the right object, but the contents of the object aren't guaranteed to be correct unless the object is immutable. Please see my series on immutability.
hash
were a long or a double.Now, let's break the code:
public int hashCode() {
if (hash == 0) {
int off = offset;
char val[] = value;
int len = count;
int h = 0;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return hash;
}
All I did was make it so that the temporary variable was only used for calculating the hash code! How could that break the code?What I've done here is to add an additional read: the second read of hash, before the return. As odd as it sounds, and as unlikely as it is to happen, the first read can return the correctly computed hash value, and the second read can return 0! This is allowed under the memory model because the model allows extensive reordering of operations. The second read can actually be moved, in your code, so that your processor does it before the first!
(Side note: the first version is better anyway, because it performs fewer memory reads. If you really care that much about performance, that's the way to go.)
What's the moral here? The moral is that this stuff is very confusing, and you have to work very hard to get it right. Make sure you understand exactly what your code does. Avoid data races wherever possible.
Here are the details on data races that I promised above:
A program has a data race in it when it is possible to have an execution of that program where:
- All of the instructions in that program occur in the order they occurred in the program (this is important for technical reasons),
- There are two accesses to the same field / array element,
- At least one of those accesses is a write, and
- There is no happens-before ordering between those actions.
I defined "happens-before ordering" in my last post. Basically, it means that there is nothing establishing an order between the two actions; for example, there is no locking protecting the accesses, no volatile writes and reads intervening and no thread starts and joins.
Comments
I wasn't aware of the limitation for 64 bit values. I guess I should take the time at some point to read the language specification from top to bottom.
Good to know!
Note: in the last example 'h' is uninitialized. Yes!!! call me '-pedantic' :-)
Thanks for the post.
What is not certain is that with an immutable object, is it better to store lazily calculated hashcode into 'volatile' or non-volatile int.
Effective Java gives an example with 'volatile'. But 'String' code does not.
I am leaning toward 'String.hashcode()' approach but is there any good reason to use 'volatile'?
BTW - you mentioned of double-read leading into performance penalty. I believe the penalty would be larger if variable is volatile. EJ mentioned 'local' variable version is about ~25% faster.
Thanks.
Do you mean the code may be rewritten by the processor as:
int h = hash;
if (hash == 0) {
...
}
return h;
Thanks for the post!
@Jeremy: given that for programs without data races (I know this program has a data race) the JMM guarantees sequential consistency (isn't it?), I was wondering a bit about that reordering.
If I understand it correctly, that reordering is allowed because without a data race the programmer cannot notice the difference, isn't it?
Also, this can be proven because there is no possible destination of a happens-before edge, so for a data-race free program "hash" has to stay unchanged, right?
@serega -- I doubt that it is a problem for 64-bit HotSpot, but I'm not 100% sure.
@stinky -- Effective Java 2 mentions the String example and calls it "racy single-check", so I'm not sure what you mean. In general, volatile reads don't have too much of a performance penalty. My suggestion is:
1) Make sure you understand what you are doing,
2) Profile before trying to optimize.
If you don't understand what you are doing, don't do it.
@Buu -- Yes, with two caveats:
1) That's not equivalent code; the equivalent code has another write to h before the end of the if statement.
2) The processor wouldn't do this kind of transformation, though, the compiler would. From a Java programmer's POV, this is 6 of one, half a dozen of the other.
@Paolo -- Sort of. It is true that if you have no data races in the code, you can't tell that a reordering occurred. That's the sequential consistency for data-race-free programs guarantee.
In these cases, I tend to reason backward. I thought about this example and realized that the Java memory model allowed the result in question. Then I thought some more and came up with a reordering that would actually trigger the weird result.
What's funny about that process is that it is often much easier to come up with weird results than it is to come up with plausible transformations! In this case, the transformation isn't terribly plausible. I could rejigger it so it was more plausible, but it would make the example more confusing, so I avoided doing that.
Thanks for pointing out racy-single check idiom. I totally missed it. Shame on me.
As for performance penality of double-read vs. using local variable, EJ mentioned on Page 284.
On double-checked idiom...
"What this variable does is to ensure that field is read only once in the common case where it's already initialized. While not strictly necessary, this may improve performance and is more elegant by the standards applied to low-level concurrent programming. On my machine, the method above is about 25 percent faster than the obvious version without a local variable."
I misunderstood that double-read performance penality is mainly due to volatile. I should not have assumed accessing volatile field is slow.
BTW - for everyone, it is Item 71: Use lazy initialization judiciously.
For your info - Java Language Specification - Chapter 17.
http://java.sun.com/docs/books/jls/third_edition/html/memory.html
17.7 Non-atomic Treatment of double and long
http://www-128.ibm.com/developerworks/java/library/j-immutability.html
The article also thanks Jeremy for some clarifications.
If on entry to hashCode() hash evaluates to 0 based on the current thread's view of the global memory state (which may very well lag behind an update performed by a thread running on a different processor) the body of the if block will be entered, and hash will be assigned the proper hash code. A second read from hash at the return statement must respect the write performed w/in the same thread -- this would have been true even in the original Java memory model. The memory model does not allow for temporal reordering w/in a linear control flow (nor, mind you, do any compiler or processor reordering).
I hope that clarifies it.
I hope that clarifies it.
The memory model says nothing about when a write to hash by thread T1 becomes visible to thread T2. However, once the write is visible in T2 all subsequent reads (by program order) will see that new value.
Chapter and verse of that particular effect is the bit about when a read is "allowed to observe" a write, also in 17.4.5.
If you don't believe me that the guarantee you are proposing isn't in there, rather than going through line by line, I'm afraid I'll have to do a proof by authority, since I wrote chapter 17. If you want more detail, you can read my doctoral dissertation or the resulting PoPL paper, both of which have the relevant proofs.
Unfortunately the links to your dissertation and PoPL paper from the JMM pages on umd.edu no longer appear to be live. The listing of concurrency testcases (under the unified proposal section) were very instructive, though -- in particular the justification for test case 16 made it clear to me how the model permits intra-thread reordering of reads to the same field.
It does worry me a bit that within a few minutes I was able to find two examples in the JDK 6 sources of hashCode methods that could return an incorrect result given this form of reordering (see java.util.UUID and javax.management.MBeanInfo).
If there ever were a compiler / processor combination that often reordered reads in this manner (and outside of IA64 that seems pretty improbable, as it's more likely than not that the compiler would store a single read of the field in a register or local) fixing up all of the problematic code could be a very arduous -- and possibly endless -- task.
Maybe it's just a problematic choice of words. Here's a quote from JLS 3rd ed 17.4.5:
"If one action happens-before another, then the first is visible to and ordered before the second."
Let's put the hash example above into the language of the spec. Consider a read r1 of variable v that happens-before a read r2 of v, and a write w to v that has no happens-before relation to them. Then the following is possible: w is "seen" by r1 but not by r2, although r1 is "visible" to r2! Maybe the spec should have made it clearer that "visible to" only really applies to the relationship between read and write actions.
Trying to formalize my intuitive understanding of "visible to" with regard to different reads of the same variable, I came up with the following:
If a read r1 of a variable v happpens-before a read r2 of v, and a write w1 to v happens-before a write w2 to v, then it must not be the case that w1 is seen by r2 and w2 is seen by r1.
But I guess such a rule would make some useful optimizations impossible...
Christopher
Having said that:
a) When we talk about "visibility", we are talking about which writes are visible to which reads. It doesn't really make much sense to talk about which reads are visible to which reads.
b) More importantly: when we say two actions are "ordered", we mean that they are ordered with respect to each other, not with respect to other actions that may happen in a data race. Here's an example. Imagine that I have (initially, p == o, p.x == 0):
Thread 1:
r1 = o.x;
r2 = p.x;
r3 = o.x;
Thread 2:
o.x = 1;
Consider the case where the compiler does redundant read elimination, so that the two reads of o.x are combined. You can then have the two reads of o.x returning 0, and the read of p.x returning 1.
The read actions are still ordered with respect to each other, in the sense that the local effects of each one (the writes to r1, r2 and r3) don't interfere with the local effects of the next, and so on. They are *not* ordered with respect to the write to o.x, which can appear to be ordered before some of these reads and not before others, pretty much at random.
It is hard to make this clear in a comment, so this is probably worth a whole separate blog entry.
plz specify re-ordered code
I can`t understand(...) difference between original hashcode() and
your modification of hashcode().
I still don't get the explanation for the hash example.
1: r1 = read hash
2: if r1 == 0 {
3: ....
4: write hash := r2
}
5: r3 = read hash
6: return r3
Now, you are saying that there is a particular reordering where r3 could be 0. Can you show me an example reordering of code which would be correct in a sequential setting, but somehow gets messed up in a concurrent setting?
I understand that the compiler (or processor) is free to reorder because there are no volatile variables or locks used, but I cannot come up a single correct reordering scenario in a uniprocessor setting.
Can you help spelling it out? Thanks.
r1 = hash;
if (hash == 0) {
r1 = hash = // calculate hash
}
return r1;
This is a valid transformation for a uniprocessor, but is a problem in a multi-processor setting. (hash could be changed by another thread between "r1 = hash" and "if hash == 0")
Thanks, jeremy.
I'm sorry for bothering you at such an old post? but could yoy clarify something?
You wrote that valid transformation could be like this one:
r1 = hash;
if (hash == 0) {
r1 = hash = // calculate hash
}
return r1;
The quote from JMM:
"Each time the evaluation of thread t generates an inter-thread action, it must match the
inter-thread action a of t that comes next in program order."
For our case we have the following order of program inter-thread actions:
1)read hash;
2) write hash(inside if);
3) read hash (on return);
For the transformation you described we have the following order of inter-thread actions:
1)read hash;
2)read hash;
3)write hash(inside if);
So the order of inter-thread actions seems to be incorrect (in terms of JMM) for me.
Could you clarify this?
That sentence refers to the semantics of the evaluation of the thread, not the actual evaluation of the thread by the JVM.
More specifically, looking at a single thread in isolation, the actions have to appear to the user as if they have been evaluated in that order. What the reads can return is dictated by the memory model. So, let's say we had two threads:
Initially, x = 0
Thread 1:
r1 = x;
r2 = x;
r3 = x;
Thread 2:
x = 1;
If we look at thread 1 in isolation, we will see three reads of x in a row, so that's what we have to see. The memory model dictates that each of these three reads is allowed to return 1 or 0, regardless of what the previous reads saw. We are therefore allowed to see r1 = r3 = 0, r2 = 1: three reads in a row, each of which is allowed to see 1 or 0.
In practice, what this means is that the system reordered r3 and r2:
Thread 1:
r1 = x;
r3 = x;
r2 = x;
And that the write to x of 1 occurred in between r3 and r2.
I know it is confusing (it's hard to explain in a comment followup, certainly), but that's the general idea. Does that clarify it at all?
OK, I can see that the JMM permits the reordering that makes the refactored hashCode() buggy, but I find it inconceivable that an actual CPU would exhibit such behavior.
Yes, there are 2 loads of #hash in the source, but the problematic out-of-order execution would only occur if:
a. javac, the JIT, and CPUs missed the opportunity to elide the second (program order) read (viz. redundant read elimination, which would not violate the JMM) and
b. determined that issuing *two* load instructions *with the same src operand* (and consecutively) is the optimal thing to do when r2 might be subsequently invalidated by the conditional jump (i.e. when r1 == 0).
I'm no expert when it comes to speculative execution by CPUs, but I can't imagine that any realworld heuristic would lead to an execution that *in the best case* is no better than a (non-optimized) sequential execution and in the worst case involves a load that is subsequently discarded (perhaps resulting in 3 loads).
Of course, Jeremy wasn't addressing what any actual CPUs might do, but what the JMM permits them to do - but I would love to see an example of this happening in practice;
OOEx is an optimization, not a deoptimization, although it's true I can't imagine revising the JMM to expressly proscribe this particular behavior, as the JMM is abstruse enough already.
Can you pls elaborate on @Vladimir's question?
I do not see how can it work differently since we don't have any conditions inside 'if' statement. And once we git into it it should work exactly the same as for one thread. Sorry for repeating @Vladimir's question.
For example, if you have this code:
r1 = x;
r2 = x;
if (r1 == r2)
y = 1;
When you execute it, you can't really tell the difference between it and:
y = 1
Because they get the same results. So you can just execute the second one.
Putting synchronization in (in the form of volatile variables or synchronized blocks or whatever) is a way to signal to the system that it shouldn't rearrange the code, because you expect it to be coordinating with another thread, so some features that may seem like they don't matter in isolation (like execution order) do.
Good example about reordering is here:
https://assylias.wordpress.com/2013/02/01/java-memory-model-and-reordering/
Which some kind of describes our situation.
What worries me much that I should throw away a lot of code written so far :)
So what should be the general rule when writing code? And why can't the first snippet be reordered somehow so it will lead to similar 0-response situation ?
The general rule is, when writing code:
75% of the time; First, try using an existing data structure to do what you want, preferably something in java.util.concurrent.
24% of the time: If that doesn't work, then use mutexes to protect your data.
0.9% of the time: If that doesn't work, then use atomics / volatile to achieve your ends.
0.1% of the time: If that doesn't work, use a heavily vetted and approved pattern for doing racy work (as long as you are in Java, other languages don't have this as an option).
Basically this is probably the second biggest disappointment in my life after I was told that there is no Santa.
Using concurrent data structured will save you from these race conditions but probably will impact performance and memory consumption.
I would bet that 80 percent of code in projects including open source ones do not care about such things. Which makes me think about such kind of race conditions as of more theoretical ones than real. Is it true from your experience?
I am not saying that this re-ordering stuff is bad or was not thought deeply beforehand. It is clear that since Java generally platform independent and though rules should be as strict as possible to make it suitable and in the same time not strict to make run on any theoretical device.
After all, it still doesn't meet the JMM guarantee of SC-DRF (there is still a data race).
There is nothing in the JMM, that treats "does a single read" methods as special (at least as far as I can tell).
Specifically, why couldn't (an admittedly insane) JIT "transform" the source from:
A. benign
r1 = this.hash
if (0 == r1)
r1 = . . . //compute hash
this.hash = r1
return r1
to:
B. malignant
r1 = this.hash
if (0 == r1)
r1 = . . . //compute hash
this.hash = r1
r2 = this.hash
return r2
Essentially transforming the "benign" version to the "malignant" one?
After all, the intra-thread semantic guarantee would be met (any single-threaded execution would never return zero); so you're left with a data-racy execution, which is best thought of as "all bets are off".
Again, I cannot imagine a JIT implementation that would actually do that, but what in the JMM would actually prevent it?
Also, yes, at a first glance, it looks to me as if the FinalWrapper would accomplish approximately the same thing as the volatile field. You are trading off an extra class and some memory bloat for possibly making the reads faster on some architectures (probably not x86).
so,
r1 = this.hash
r1 = computeHash()
this.hash = r1
I'm not sure where in JLS $17.4 that there is an explicit proscription on eliding the local variable and updating this.hash directly;
Certainly if you presumed a single-threaded execution that would be allowed, and since there is always a data race . . .
and is it the return r1 that eliminates the possibility?
It would be great if you could point out which part of JLS section 17 addresses local variable elision, as I can't find it
In general, the way local variable elimination is addressed is bullet 3 in 17.4.7. The compiler can do anything it wants provided that the thread behaves as if it executed in the right order. That's the basic assumption behind all compiler optimizations. As long as no one can tell, the compiler can do whatever it wants, including getting rid of assignments to local variables.
And to be honest I'm still struggling to rationalize the broken code. In which situation the hardware would move the last read at the top, leave the second read as it is, and introduce this extra write to the local variable before the end of the if statement? It's so counter intuitive.
This comment from hudson describes perfectly my thoughts:
I can see that the JMM permits the reordering that makes the refactored hashCode() buggy, but I find it inconceivable that an actual CPU would exhibit such behavior.
Yes, there are 2 loads of #hash in the source, but the problematic out-of-order execution would only occur if:
a. javac, the JIT, and CPUs missed the opportunity to elide the second (program order) read (viz. redundant read elimination, which would not violate the JMM) and
b. determined that issuing *two* load instructions *with the same src operand* (and consecutively) is the optimal thing to do when r2 might be subsequently invalidated by the conditional jump (i.e. when r1 == 0).
I'm no expert when it comes to speculative execution by CPUs, but I can't imagine that any realworld heuristic would lead to an execution that *in the best case* is no better than a (non-optimized) sequential execution and in the worst case involves a load that is subsequently discarded (perhaps resulting in 3 loads).
Of course, Jeremy wasn't addressing what any actual CPUs might do, but what the JMM permits them to do.
I tried to reproduce the re-ordering by writing a jcstress test, without success unfortunately. Would you be able to share a way to reproduce this exact weird situation?
Thanks
I urge you not to program to what you think a particular hardware architecture may or may not do. First of all, compiler optimizations can be very tricky. Second of all, code motion is a moving target - what is true in 2019 may not be true in 2029.