Skip to main content

Transactional Hardware on x86

While I'm posting anyway, I might as well mention the big concurrency news of the day: the new transactional memory specification from Intel. Transactional memory sounds a lot like what it is. Your code begins a transaction and continues to execute it until it ends, performing some memory writes along the way. If there are no conflicting transactions (i.e., transactions that write to the same memory that your code wrote) executing at the same time as yours, your transaction will end normally and commit your memory writes. If there are conflicting transactions, your transaction will abort and roll back your writes.

Transactional memory on x86 will come in two flavors:
  1. Hardware Lock Elision (HLE), which consists of XACQUIRE and XRELEASE instruction prefixes. These can optimistically turn lock regions into transactions. What's the advantage? Well, transactions can execute concurrently, as long as they aren't writing to the same memory. Locks are serialized: only one thread can be in a lock region at a time. So this mechanism can (hypothetically) execute more code concurrently. Also - and perhaps more importantly - it should be much cheaper to acquire a lock using XACQUIRE than it is to acquire it without, since with XACQUIRE you don't have to do anything other than tell the system to start looking out for conflicts. Between the two advantages, HLE should be able to provide a nice performance boost. These are backwards compatible, so that software written to use them can be run on earlier hardware.
  2. Restricted Transactional Memory (RTM), which consists of XBEGIN, XEND and XABORT instructions. These have the traditional transactional memory semantics: you begin a transaction, and if you abort before you end it, your writes are undone. These are very flexible, but are not backwards compatible (i.e., you can't use them on older hardware).
I like the idea of hardware-level transactions / atomicity. Compilers can be written to take advantage of them, and they can be a win for people implementing synchronization primitives and doing very low-level multithreading. They have the potential of making synchronization very cheap when there isn't much contention. AFAICT, Azul has claiming very large benefits from their HLE-style synchronization for years (edited: They claim <10%: see this presentation on hardware transactional memory from Cliff Click).

There is also likely to be a renaissance in lock-free data structures. Currently, lock-free data structures are pretty much all built on top of one non-blocking atomic instruction: the compare-and-swap (LOCK: CMPXCHG on x86, or AtomicReference.compareAndSet() to Java programmers). Trying to shoehorn everything you might want to do atomically into a single instruction that operates on one word has led to some very interesting data structures over the last 20 years or so, but is really the wrong way to solve the problem.

The flip side of this is that I don't think these will have much of an impact on most user-level synchronization. The HLE is really just an optimization for current lock implementations. The RTM aborts if you execute any of a large number of instructions (like PAUSE or CPUID, or many operations that affect control flow), or if there is a conflicting write to the cache line (which may not have any logical connection to the data the programmer cares about): the average programmer who deals with multithreaded code can't reasonably be expected to think about what is going on at the microarchitectural level.

Also, as with most transactional memory systems, RTM only deals with memory writes to a limited number of cache lines. If you have to deal with, for example, I/O, or if you touch too many cache lines, you have to write an abort handler that deals with rollback correctly. My feeling is that this will probably be far too difficult for most programmers.

Transactions' usefulness also depends enormously on the quality of their implementation. I'd love to see some hardware, but it isn't expected to be released until 2013.


Comments

Cris Perdue said…
Really interesting news and commentary, thanks for the nice clear post.
Jeremy Manson said…
Thanks for the kind words.
Roland said…
I don't see azul claiming big gains here: http://sss.cs.purdue.edu/projects/tm/tmw2010/talks/Click-2010_TMW.pdf
Jeremy Manson said…
I guess I was remembering the 2x, rather than the <10%.
Golovach Ivan said…
Hi, Jeremy.
I have another question: i'm very interested in deep understanding of New JMM but i found that i haven't access to page with Simulator (http://www.cs.umd.edu/users/jmanson/java.html). Can i have access to it or to another source of Simulator?
Anonymous said…
What control flow does abort?
I don't read that in the spec.
Jeremy Manson said…
@Ivan: It's not that new a JMM anymore! I don't recommend the simulator; there are known bugs I never got around to fixing. I still have a copy of it around somewhere, but I'm not sure you won't do better with some of the other people who have built formal models of it, and plugged them into their model checking frameworks.

@Anonymous: Far CALL, Far JMP, Far RET and IRET may (or may not) cause aborts. Also, kernel calls, if you count that. It's possible that calling those control flow is misleading.
crescent said…
Nice Sharing with this blog.
tgamblin said…
Seems like it's worth mentioning here that transactional memory and speculative execution are already shipping from IBM, albeit in a machine that most people won't be able to afford:

http://researcher.watson.ibm.com/researcher/files/us-pengwu/BGQPerfPaper-final-PACT12.pdf
BlueCube said…
Thanq for sharing a complete Information
Unknown said…
Nice Post About the JAVA Concurrency

Popular posts from this blog

Double Checked Locking

I still get a lot of questions about whether double-checked locking works in Java, and I should probably post something to clear it up. And I'll plug Josh Bloch's new book, too. Double Checked Locking is this idiom: // Broken -- Do Not Use! class Foo {   private Helper helper = null;   public Helper getHelper() {     if (helper == null) {       synchronized(this) {         if (helper == null) {           helper = new Helper();         }       }     }   return helper; } The point of this code is to avoid synchronization when the object has already been constructed. This code doesn't work in Java. The basic principle is that compiler transformations (this includes the JIT, which is the optimizer that the JVM uses...

What Volatile Means in Java

Today, I'm going to talk about what volatile means in Java. I've sort-of covered this in other posts, such as my posting on the ++ operator , my post on double-checked locking and the like, but I've never really addressed it directly. First, you have to understand a little something about the Java memory model. I've struggled a bit over the years to explain it briefly and well. As of today, the best way I can think of to describe it is if you imagine it this way: Each thread in Java takes place in a separate memory space (this is clearly untrue, so bear with me on this one). You need to use special mechanisms to guarantee that communication happens between these threads, as you would on a message passing system. Memory writes that happen in one thread can "leak through" and be seen by another thread, but this is by no means guaranteed. Without explicit communication, you can't guarantee which writes get seen by other threads, or even the order in whic...

Atomicity, Visibility and Ordering

(Note: I've cribbed this from my doctoral dissertation. I tried to edit it heavily to ease up on the mangled academic syntax required by thesis committees, but I may have missed some / badly edited in places. Let me know if there is something confusingly written or just plain confusing, and I'll try to untangle it.) There are these three concepts, you see. And they are fundamental to correct concurrent programming. When a concurrent program is not correctly written, the errors tend to fall into one of the three categories: atomicity , visibility , or ordering . Atomicity deals with which actions and sets of actions have indivisible effects. This is the aspect of concurrency most familiar to programmers: it is usually thought of in terms of mutual exclusion. Visibility determines when the effects of one thread can be seen by another. Ordering determines when actions in one thread can be seen to occur out of order with respect to another. Let's talk about t...