Tuesday, February 12, 2008

Causality and the Java Memory Model

NB: I started writing this last August, and sent it to a friend of mine for thoughts. I just realized that he never got back to me on it, and I never posted it. I don't much feel like editing it now, so here it is in its original glory

Okay, I know. It is nearly impossible to read and understand the causality bits of the Java memory model. That's all that stuff about "Executions and Causality Requirements" on pages 568 – 570 of the Java Language Specification. Bill Pugh, Sarita Adve and I wrote it. Basically, in the last three years, there have been three kinds of reactions to it:

  1. ZOMGWTF!? (I'm a normal human being. Give me a break.)

  2. This is incomprehensible gibberish. (I'm a pretty smart cookie and I think that everything I can't understand in three seconds is worthless.)

  3. Okay, I think I get it, but I have some questions. (I'm a pretty smart cookie and I spent several days trying to understand it.)


I find that the last group usually pretty much understands it, with a few minor clarifications. They are also overwhelmingly graduate students, for what it's worth. They have found a few flaws in it, which is also nice.

(Wait, I hear you ask, nice? Alas, I must admit that I never thought the JMM would be bulletproof. It was no more likely to be bulletproof than version 1.0 of a major software release. The main goal was to create a memory model which described the problem space completely, was practical for a real programming language and was rigorous (so that it could be completely understood with sufficient effort). The fact that it was possible to find a few flaws in specific cases (rather than by saying "this is a complete mess", which used to be the case) means that people could understand it and reason about it, and that therefore the mission was accomplished. No one had ever done that for a real language before, and I'm proud of the work we did.)

That still leaves us with the original problem, which is that it is really, really hard to understand. What I'm going to do in this post is to try to explain the key intuition behind it very briefly. If it still doesn't make sense, then tell me about it, and I'll try to clean it up.

(The prerequisite to understanding what I am about to say is an understanding of the happens-before order. If you understand that, and you understand that the lack of that in your program can lead to weird effects, then the rest of the post may interest you. If you don't understand that, this is not the post for you.)

There are two basic questions that we need to address to understand the memory model:

Why is the happens-before relationship not sufficient?

The main guarantee we want to give programmers is something called "Sequential Consistency for Data Race Free programs" (SC for DRF, for short). What this means, in brief, is that if your program is free from data races, then you won't see any weird effects. If you are still reading, you know that a data race is two accesses, at least one of which is a write, to the same variable, which are not ordered by happens-before. Now, consider the example:


Initially:

0: x == y == 0

Thread 1:
1: r1 = x;
   if (r1 == 1)
2:   y = 1;

Thread 2:
3: r2 = y;
   if (r2 == 1)
4:   x = 1;


If all of this program's statements were to execute, there would be happens-before relationships between 0 and 1, 1 and 2, 0 and 3, 3 and 4. But there would be no happens-before relationships between Thread 1 and Thread 2.

Now, an intuitive model, built entirely around happens-before, basically says that if you are trying to figure out the value that can be seen by a read, you can either look at the last write to a given variable via happens-before, or a write that is in a data race with the read. That would mean that the read at 3 could see the write at 2, and the read at 1 can see the write at 4, because those writes are in a data race with those reads.

We can now say that the read of x at 1 sees 1, and the read of y sees 1, and have that be legal under the happens-before model.

Do you see what happened there? The read of x at 1 saw the value 1, then wrote 1 to y at 2. The read of y at 3 saw the value 1, and wrote 1 to x at 4. That justified the read of x at 1 seeing the value 1.

We can't allow this, because it violates the SC for DRF property. A program is data race free if, under all SC executions -- those are the executions without any weird reorderings -- there are no data races. There are no data races under any SC execution of this program -- the reads will both return 0, so the writes won't occur, so there is no data shared between the two threads. Therefore, this program is data race free, and it has to have only SC executions. The execution I just described -- where the writes occur -- is not a SC execution.

Since the happens-before model allows this example, it doesn't enforce the SC for DRF property. We therefore had to make some additional guarantees.

Um... okay. What are the additional guarantees?

Now that we have decided that there are actions that are not allowed to happen --in the above example, the writes are not allowed to occur -- the trick is to figure out when an action is allowed to occur, and when it is not.

Now, if you are writing code, and want to figure out if an action can occur, what do you do? Well, you look at all of the actions that came before it. So let's look at the actions that happen-before the action you want to justify, and determine whether those actions justify it.

In the above example, we look at instruction 2, and then look at the actions that happen-before it, which are 1 and 0. 1 and 0 don't justify 2 happening, so it is not allowed to happen.

Let's try this on a different example. This is a good one:


0: x == y == 0

Thread 1:
1: r1 = x;
2: y = 1;

Thread 2:
3: r2 = y;
4: x = r2;


The interesting result here is r1 == r2 == 1, which is possible in a real execution if Thread 1's actions are reordered by the compiler. Basically, the write to y occurs, then Thread 2 reads the 1 from y and writes it to x, and then Thread 1 reads that value for x.

The first instruction we want to execute is the one labeled "2". If you look at the actions that happen-before 2 (again, 1 and 0), they will always lead into 2 being executed. So 2 is allowed to happen.

That's well and good for action 2, but the interesting result requires that the value 1 be read from y in instruction 3. This isn't justifiable based on what happens-before 3 -- if we are only looking at what happens-before 3, the only value that can be read from y is 0.

So we need some way of justifying the read of y in 3 returning the value 1. It can only do this if something wrote the value 1 to y. But wait -- we've already said that it was justifiable to write 1 to y! So if, in addition to being able to justify actions based on the actions that happen-before them, we also say that it is possible to justify actions based on actions you have already justified, then we have now allowed our result!

So there is the intuition behind the model. When you want to justify the fact that a read returns the value of a write, you can do so if:

a) That write happened before it, or
b) You have already justified the write.

The way that the model works is that you start with an execution that has all of the actions justified by property a), and you start justifying actions iteratively. So, in the second example above, you create successive executions that justify 0, then 2, then 3, then 4, then 1.

This definition has the lovely property of being reasonably intuitive and also guaranteeing SC for DRF programs.

That's the most complicated bit of the model (perhaps excluding the stuff about observable actions and infinite executions, but they are much less important).

I hope that helps.

12 comments:

jeuneS2 said...

I'm trying to understand this part of the JMM, but the formal definitions puzzle me every time I read them (in contrast to, say, the formal definitions for data-flow analyses). Your comments clarified a few things, but I'm still not entirely happy with what I understand.

I'm currently working on the data caches for a Java multiprocessor. We intend to use a fairly simple consistency model, which basically implements lazy release consistency. Memory accesses are never reordered on the issuing side in our implementation and our caches are write-through, but caches are not held consistent with main memory unless a monitorenter or a read from a volatile variable occurs. While it isn't hard to see that this follows the rules of happens-before (thread and finalizer issues put aside), I have no idea how to justify that the causality requirements of the JMM aren't violated. Any idea how to check if an implementation is compliant with this part of the JMM spec?

Jeremy Manson said...

@jeuneS2 - I'm sorry you are having a hard time with it. You are certainly not alone in that. If it makes you feel better, know that most people who struggle with it eventually figure it out.

A simple rule of thumb for the causality requirements is that they are basically going to be met if you don't perform speculative writes. It sounds as if you don't, so you should be fine. Without knowing all the details, it is harder to get more concrete.

NimaJ said...

I'm trying to understand the Java memory model, but unfortunately got stock in the causality requirements for executions. To make this simple, lets consider the example 5.1 in your dissertation (simple reordering) in which you mentioned that the read of y must return 0 in E2, because of rule 6. Honestly, I am not able to get the points of Rule 6 and 7, both of them seem the same to me. What exactly is the difference between Ai and Ci? (seems to be equal?)

I really appreciate if you give me a hint.

Jeremy Manson said...

@NimaJ - Basically, to determine whether an execution is valid, you "commit" part of that execution in each step. For each step i, the set of actions that are committed is C_i and the set you are adding from the previous step is C_i - C_i-1.

In order to commit a given set C_i - C_i-1, you need to justify it with a complete execution of the program in which all of the actions in C_i take place. The set of all actions in the ith execution is A_i.

So, A_i is the complete ith execution, A_n is the final execution, and C_i is the part of A_i that is also in A_n.

Does that make sense?

Paolo G. Giarrusso said...

I'm trying to understand why "happens-before" does entail sequential consistency in C++ [Boehm&Adve08] but not in Java [Manson et al., 05], and I think I found it, but I wanted to check it with you.

The root cause is that Java gives semantics to programs with data races, C++ does not. This leads to a key difference in the definition of which writes a read is allowed to observe. For Java, Sec. 3 of [Manson et al., 05] says that for a read $r$ to be allowed to observe a write $w$, $r$ cannot happen-before $w$.

For C++, Sec. 6 of [Boehm&Adve08] says that for $w$ to be a visible side effect for $r$, $w$ must happen before $r$. That makes the trick!
Sec 4. of Manson et al. shows that the happens-before memory model does not entail sequential consistency for data-race-free program by exhibiting an example; this example however relies crucially on reads seeing the effect of writes without the happens-before relationship ensuring any order.

Does it make sense to you?

Jeremy Manson said...

@Paolo - Happens-before does not imply SC in either language. In both C++ and Java, there is sequential consistency for data race free programs. Java also provides semantics for programs with data races (as detailed in this post).

Paolo G. Giarrusso said...

I did mean SC for DRF programs, even if I just wrote "sequential consistency". Could you please look again at my question, keeping this correction in mind? The paper on the C++ model [Boehm&Adve08] explicitly states that the happens-before model (as they define at the end of the paper) gives sequential consistency to data-race-free programs (in their theorem 7.1).

[Boehm&Adve08], Hans-J. Boehm, Sarita V. Adve, Foundations of the C++ memory model, PLDI '08

Jeremy Manson said...

@Paolo - My understanding is that the reason that C++ guarantees SC for DRF programs is that that particular guarantee is part of the semantics. That is, it doesn't fall out of any other properties.

Jeremy Manson said...

@Paolo - Actually, looking back at the paper, I'm wrong (it wasn't done by fiat). But the proof is in Sec 7 of http://www.hpl.hp.com/techreports/2008/HPL-2008-56.pdf , so I'm unlikely to state it better.

Zmeul said...

I have two big questions which confuse the hell out of me:
1. What does the "execution order" represent as it is used in the Java Language specs?
To be more precise:
Trace 17.5 has two possible such execution orders listed on the following page. Based on those one can rule out that execution order must be consistent with program order (also, why are those numbers followed by colons in front of the actions out of order - 1,3,2,4?). It is also not the "real life" execution order since the second one has reads that see later writes and, literally speaking, a read cannot possibly see the write before it takes place, right?
My humble intuition tells me that a valid execution order for Trace 17.5 to finish with r1=1,r2=2 is:
B = 1;
A = 2;
r2 = A;
r1 = B. No reorderings or "read later writes" required.
2. What do you mean by "happens-before order allows a read to see a later write"? From which perspective is it "later"? Since it is not actually executed later (quote: "bearing in mind that if a read sees a write that occurs later in the execution, it represents
the fact that the write is actually performed early") nor is happens-before defined for such a pair of read-write that happens in two separate threads.

Jeremy Manson said...

In answer to your first question, "execution order" as used in this context is intended to mean an ordering of the execution that is consistent with happens-before consistency. It is an informal notion. Your intuition is otherwise correct, in that there are other execution orders that can produce the same result.

Similarly, the notion of "later write" is an informal one. This is
why it is in the "discussion" section, rather than the "spec" section.

Jingguo Yao said...

This explanation is worth of reading over and over again.