Skip to main content

Transactional Hardware on x86

While I'm posting anyway, I might as well mention the big concurrency news of the day: the new transactional memory specification from Intel. Transactional memory sounds a lot like what it is. Your code begins a transaction and continues to execute it until it ends, performing some memory writes along the way. If there are no conflicting transactions (i.e., transactions that write to the same memory that your code wrote) executing at the same time as yours, your transaction will end normally and commit your memory writes. If there are conflicting transactions, your transaction will abort and roll back your writes.

Transactional memory on x86 will come in two flavors:
  1. Hardware Lock Elision (HLE), which consists of XACQUIRE and XRELEASE instruction prefixes. These can optimistically turn lock regions into transactions. What's the advantage? Well, transactions can execute concurrently, as long as they aren't writing to the same memory. Locks are serialized: only one thread can be in a lock region at a time. So this mechanism can (hypothetically) execute more code concurrently. Also - and perhaps more importantly - it should be much cheaper to acquire a lock using XACQUIRE than it is to acquire it without, since with XACQUIRE you don't have to do anything other than tell the system to start looking out for conflicts. Between the two advantages, HLE should be able to provide a nice performance boost. These are backwards compatible, so that software written to use them can be run on earlier hardware.
  2. Restricted Transactional Memory (RTM), which consists of XBEGIN, XEND and XABORT instructions. These have the traditional transactional memory semantics: you begin a transaction, and if you abort before you end it, your writes are undone. These are very flexible, but are not backwards compatible (i.e., you can't use them on older hardware).
I like the idea of hardware-level transactions / atomicity. Compilers can be written to take advantage of them, and they can be a win for people implementing synchronization primitives and doing very low-level multithreading. They have the potential of making synchronization very cheap when there isn't much contention. AFAICT, Azul has claiming very large benefits from their HLE-style synchronization for years (edited: They claim <10%: see this presentation on hardware transactional memory from Cliff Click).

There is also likely to be a renaissance in lock-free data structures. Currently, lock-free data structures are pretty much all built on top of one non-blocking atomic instruction: the compare-and-swap (LOCK: CMPXCHG on x86, or AtomicReference.compareAndSet() to Java programmers). Trying to shoehorn everything you might want to do atomically into a single instruction that operates on one word has led to some very interesting data structures over the last 20 years or so, but is really the wrong way to solve the problem.

The flip side of this is that I don't think these will have much of an impact on most user-level synchronization. The HLE is really just an optimization for current lock implementations. The RTM aborts if you execute any of a large number of instructions (like PAUSE or CPUID, or many operations that affect control flow), or if there is a conflicting write to the cache line (which may not have any logical connection to the data the programmer cares about): the average programmer who deals with multithreaded code can't reasonably be expected to think about what is going on at the microarchitectural level.

Also, as with most transactional memory systems, RTM only deals with memory writes to a limited number of cache lines. If you have to deal with, for example, I/O, or if you touch too many cache lines, you have to write an abort handler that deals with rollback correctly. My feeling is that this will probably be far too difficult for most programmers.

Transactions' usefulness also depends enormously on the quality of their implementation. I'd love to see some hardware, but it isn't expected to be released until 2013.


Comments

Cris Perdue said…
Really interesting news and commentary, thanks for the nice clear post.
Jeremy Manson said…
Thanks for the kind words.
Roland said…
I don't see azul claiming big gains here: http://sss.cs.purdue.edu/projects/tm/tmw2010/talks/Click-2010_TMW.pdf
Jeremy Manson said…
I guess I was remembering the 2x, rather than the <10%.
Golovach Ivan said…
Hi, Jeremy.
I have another question: i'm very interested in deep understanding of New JMM but i found that i haven't access to page with Simulator (http://www.cs.umd.edu/users/jmanson/java.html). Can i have access to it or to another source of Simulator?
Anonymous said…
What control flow does abort?
I don't read that in the spec.
Jeremy Manson said…
@Ivan: It's not that new a JMM anymore! I don't recommend the simulator; there are known bugs I never got around to fixing. I still have a copy of it around somewhere, but I'm not sure you won't do better with some of the other people who have built formal models of it, and plugged them into their model checking frameworks.

@Anonymous: Far CALL, Far JMP, Far RET and IRET may (or may not) cause aborts. Also, kernel calls, if you count that. It's possible that calling those control flow is misleading.
crescent said…
Nice Sharing with this blog.
tgamblin said…
Seems like it's worth mentioning here that transactional memory and speculative execution are already shipping from IBM, albeit in a machine that most people won't be able to afford:

http://researcher.watson.ibm.com/researcher/files/us-pengwu/BGQPerfPaper-final-PACT12.pdf
BlueCube said…
Thanq for sharing a complete Information
Unknown said…
Nice Post About the JAVA Concurrency

Popular posts from this blog

Double Checked Locking

I still get a lot of questions about whether double-checked locking works in Java, and I should probably post something to clear it up. And I'll plug Josh Bloch's new book, too. Double Checked Locking is this idiom: // Broken -- Do Not Use! class Foo {   private Helper helper = null;   public Helper getHelper() {     if (helper == null) {       synchronized(this) {         if (helper == null) {           helper = new Helper();         }       }     }   return helper; } The point of this code is to avoid synchronization when the object has already been constructed. This code doesn't work in Java. The basic principle is that compiler transformations (this includes the JIT, which is the optimizer that the JVM uses) can change the code around so that the code in the Helper constructor occurs after the write to the helper variable. If it does this, then after the constructing thread writes to helper, but before it actually finishes constructing the object,

What Volatile Means in Java

Today, I'm going to talk about what volatile means in Java. I've sort-of covered this in other posts, such as my posting on the ++ operator , my post on double-checked locking and the like, but I've never really addressed it directly. First, you have to understand a little something about the Java memory model. I've struggled a bit over the years to explain it briefly and well. As of today, the best way I can think of to describe it is if you imagine it this way: Each thread in Java takes place in a separate memory space (this is clearly untrue, so bear with me on this one). You need to use special mechanisms to guarantee that communication happens between these threads, as you would on a message passing system. Memory writes that happen in one thread can "leak through" and be seen by another thread, but this is by no means guaranteed. Without explicit communication, you can't guarantee which writes get seen by other threads, or even the order in whic

Date-Race-Ful Lazy Initialization for Performance

I was asked a question about benign data races in Java this week, so I thought I would take the opportunity to discuss one of the (only) approved patterns for benign races. So, at the risk of encouraging bad behavior (don't use data races in your code!), I will discuss the canonical example of "benign races for performance improvement". Also, I'll put in another plug for Josh Bloch's new revision of Effective Java (lgt amazon) , which I continue to recommend. As a reminder, basically, a data race is when you have one (or more) writes, and potentially some reads; they are all to the same memory location; they can happen at the same time; and that there is nothing in the program to prevent it. This is different from a race condition , which is when you just don't know the order in which two actions are going to occur. I've put more discussion of what a data race actually is at the bottom of this post. A lot of people think that it is okay to have a data