Saturday, March 31, 2007

Safe Construction and the JMM

Another question from a viewer. We have this code:

class C {
  private int x;

  public C(int v) {
    this.x = v;
  }

  public synchronized int get() {
    return this.x;
  }

  public synchronized void set(int v) {
    this.x = v;
  }
}

(For the actual question as it was worded, the short answer is, "yes")

The question was one I get a lot. He asked how one thread can safely read the initial write of v to x. That is to say, if we have:


Thread 1:
globalRef = new C(5);

Thread 2:
C ref = globalRef;
if (ref != null) {
  int r1 = ref.x;
}


How can we change this code to guarantee that the read in Thread 2 will see the value 5 for x (assuming that ref is not null)?

Let's back up for a minute, and talk about why, in the code as written, the read in Thread 2 will not see the value 5 for x. Under the Java memory model, for one thread to be guaranteed to see an update made to a variable by another thread, there must be a happens-before relationship between the update and the read. This means that sometime after the write, there must be an action that performs what we call a "release", and sometime before the read, there must be an action that performs a corresponding "acquire".

The easiest way to get a release and a corresponding acquire is to use locking on the same variable. In this case, we would do something like this:


Thread 1:
synchronized (C.class) {
  globalRef = new C(5);
}

Thread 2:
synchronized (C.class) {
  C ref = globalRef;
  if (ref != null) {
    int r1 = ref.x;
  }
}


There is a release when the lock is released at the end of Thread 1, and an acquire when the lock is subsequently acquired at the beginning of Thread 2. This provides the guarantee that the updates that happen-before the release will be seen by the reads that happen-after the acquire.

For the really wonky -- what could happen? Well, we have


globalRef = new C(5);


which is really:


localRef = <allocate>;
localRef.x = 5;
globalRef = localRef;


There's no dependency between the second and third lines there, so we could easily have:


localRef = <allocate>;
globalRef = localRef;
localRef.x = 5;


And another thread could read globalRef and see the reference to the object before x is assigned the value 5.

Next post -- why this explanation is bad, bad, bad! Any guesses?

21 comments:

Andrei Badea said...

Has there already been a "next" post? I don't see anything wrong with the explanation, so I'd be interested.

Jeremy Manson said...

Of course, a month and a half later, I can't remember what the follow-up post was supposed to be about. I got sick in the mean time, and this sort of fell off of my radar.

I can certainly see some potential follow-up topics. For example, in Thread 2, we are not protecting ref.x with a lock on ref; this is bad, because it is protected with a lock on this in the getter and setter methods. So, the obvious fix is something like:

synchronized (C.class) {
  C ref = globalRef;
  synchronized (ref) {
    if (ref != null) {
      int r1 = ref.x;
    }
  }
}

That's not great either, because then you are holding two locks, which can be a little deadlock-prone. Like I say, better solutions exist. More for another day, I think.

Andrei Badea said...

Ah right, I didn't realize you were reading x directly. Well isn't

C ref;
synchronized (C.class) {
  ref = globalRef;
}
if (ref != null) {
  synchronized (ref) {
    int r1 = ref.x;
  }
}

correct?

But in the real world you would not do ref.x, but ref.get(). And if I understand correctly, even in that case you would still need the "synchronized (C.class)" in both Thread 1 and Thread 2 to ensure the write to x in the constructor is visible to Thread 2. Thus:

synchronized (C.class) {
  C ref = globalRef;
}
if (ref != null) {
  int r1 = ref.get();
}

should be safe. No?

Naveed Shaikh said...

Regarding the original post, I think you can also solve this problem (a thread should be able to safely read the initial write of v to x) by declaring globalRef as volatile. This will prevent reordering - guaranteeing that initializations in C constructor happen before allocation to globalRef.

What do you think?

Jeremy Manson said...

Andrei: That seems fine.

Naveed: That would not work. Marking a field volatile doesn't prevent all reorderings. Specifically, you can move a write to a non-volatile variable to before a write to a volatile variable. In this case, that would allow a compiler to move the assignment to globalRef to before the assignment to x.

Having said that, marking the field final would work, and is actually the best solution.

Naveed Shaikh said...

Jeremy, let's put it this way:

[globalRef is volatile]

Thread 1:
// Expanding globalRef = new C(5);
localRef = [alloc] ;
localRef.x = 5;
globalRef = localRef;

Thread 2:
if (globalRef != null) {
int r1 = globalRef.x;
}

Assume that Thread 2 sees a non-null globalRef. This means that Thread 1 has performed a write on it. So there must be a happen-before edge from globalRef write in Thread 1 to globalRef read in Thread 2, and under new JMM, compilers are not allowed to bring non-volatile reads/writes inside this edge. I used this reasoning to say declaring globalRef as volatile can also work. Comments?

I agree that marking the field as final is a good solution; the only caveat being that we have to remove 'this.x = v' in 'set' method :-)

Jeremy Manson said...

Naveed: Um.. yes. I claim utter brain death. For some reason, I read "mark globalRef as volatile" as "mark x as volatile".

me said...

Hey Jeremy,

I think the pattern of setting "this.field" in the constructor and then providing synchronized access to the is common, yet folks don't take special measure to avoid this problem. Why isn't all the code in the world broken? Or is it?

Jeremy Manson said...

me: The correctness of your code depends on how the object is published. If, for example, it is shared by writing it out to a static volatile field, then the other threads will see the correct values. Similarly, if you are using a producer / consumer queue to publish the reference, then the correct synchronization will happen inside the queue.

If, on the other hand, you just randomly write it out to a field and expect other threads to read it correctly, then you are wrong.

Having said this, it is well-known among people who work with large-scale multi-processor and multicore designs that a large amount of the code out there is broken, and the errors simply don't show up until you scale up.

Buu said...

On the topic of safe publication, I have a confusion and I really appreciate if you can help explain.

Assume I have a class
class Server {
Map data;
public Server() {
data = createMap();
new CustomThread(this).start();
}
}

I understand that if Server is to be subclass, then we can say that Server is improperly constructed because the subclass might add more initialize to its construction and some overridden methods which in combination might break the CustomThread (which invokes those methods).

However, if Server is declare as final so that it's not overridable, then can we consider it as being properly constructed (assuming CustomThead uses the passed 'this' for its own use and doesn't assign it to anywhere)?

Besides, since Thread#start() is one of the happens-before rules, I suppose the newly started Thread can also see up-to-date value of the data Map. Is this true?

Thanks a lot!
Buu

Jeremy Manson said...

@buu -- this usage is protected by the happens-before rules, but it is certainly ugly, and not protected by the safe construction rules.

Buu said...

"...and not protected by the safe construction rules"
This is the part I don't understand. Maybe because I don't understand what the rules of safe construction are.

To be specific, if the Server class is not overridable, I don't see any difference between these 2 approaches, regardless of what the CustomThread actually does with the 'this' reference passed it its constructor.

1.
final class Server {
Server() {
data = createMap();
new CustomThread(this).start();
}
}

...
new Server();

2.
final class Server {
Server() {
data = createMap();
}
void start() {
new CustomThread(this).start();
}
}

...

new Server().start()



Thanks!

Jeremy Manson said...

@buu -- Well, this wouldn't be protected by the safe construction rules anyway, because "data" isn't final. The rule of thumb is that if you read a reference to the object that was written before the constructor completed, then you might not see the correctly constructed contents of the final field.

However, in this case, the guarantee is provided by the happens-before relationship.

Buu said...

Got that. Thanks, Jeremy!

Brent said...

Jeremy: as others have pointed out, making the field final is a non-starter because then you would have to eliminate the set method, which is part of the desired api.

Umm, did I miss something in my skimming of the comments, or has no one suggested the obvious solution, which is to call set from the constructor, i.e.

public C(int v) {
set(v);
}

?

You have now solved the problem, which is ensuring that all reads and writes (even ones done in the constructor) use the same lock.

HB said...

Hi Jeremy,

I have some doubt regarding safe publication. This is an example from Java Concurrency in Practice Listing 3.8:

public class SafeListener {
private final EventListener listener;

private SafeListener() {
listener = new EventListener() {
public void onEvent(Event e) {
doSomething(e);
}
};
}

public static SafeListener newInstance(EventSource source) {
SafeListener safe = new SafeListener();
source.registerListener(safe.listener);
return safe;
}
}

Here I understand that the object is properly constructed as "this" doens't escape during construction because we don't publish the "listener" field in the constructor.

Question 1:
But why is this constructor made private? I mean even if its public the object is still properly constructed and there's no risk of unsafe publication.

Question 2:
What is the guarantee in the factory method that the "safe" is published only after the constructor returns? I mean why
"SafeListener safe = new SafeListener();" happens before "source.registerListener(safe.listener);" in the factory method.

Jeremy Manson said...

@HB: Q1: I believe that the idea is that if the constructor were public, the call to registerListener() would have to go in the constructor, which would be unsafe.

Q2: Final fields are not guaranteed to be properly constructed until the constructor has finished (regardless of when they are set). Once the constructor is finished, you can safely publish the reference to the object that was constructed and assume that it was safe. I don't have JCiP in front of me, so I don't know why he is publishing safe.listener instead of safe (which seems safer). You might ask Brian.

HB said...

Thanks for your reply.

Like you said "Once the constructor is finished, you can safely publish the reference to the object that was constructed and assume that it was safe." What I wanted to know is that is the constructor in "SafeListener safe=new SafeListener()" finished before "source.registerListener(safe.listener);" so that "safe" is properly published?

Jeremy Manson said...

Yes. The constructor finishes when .. er.. the constructor finishes.

HB said...

Sorry didn't see your reply. But I didn't mean that.
Okay consider this as you have given in an example earlier,
//Expanding safe=new SafeListener()
localRef=[alloc]
localRef.listener=....
safe=localRef;

Here if the compiler reorders the last two statements like:
safe=localRef;
localRef.listener=....
and before the initialization of second statement is completed, it registers safe in "source.registerListener(safe);"(as you said that its safer to publish "safe") then isn't "safe" improperly published?

Jeremy Manson said...

@HB - The code in isolation doesn't scream "safe" to me. I don't have my copy of JCiP handy to check it, though.