Saturday, April 4, 2009

Faster Logging with Faster Logger Classes

Today, I'll discuss a little tweak I made to java.util.Logging that made my logging throughput double. I want to use it as an illustration that it often isn't very difficult to improve the performance of concurrent code by doing things that are actually pretty easy to do.

So, "I" have an application that is running a couple of hundred threads on an 8-core machine, and it wants to log about 2MB a second using java.util.Logger. When I say "I have", I actually mean "someone else has", because if "I" had to log a megabyte a second, there is no way I would use java.util.Logger to do it. Still, we all make our choices.

When I came to this code, it was already doing sensible things like buffering its output. It wasn't doing something more complicated, like using a dedicated logging thread. It could log about 1MB a second, and was chewing through CPU pretty rapidly. I just decided to run our profiler on the code and see where the bottlenecks were. Our profiler is based on the undocumented AsyncGetCallTrace profiling call in HotSpot, and can actually profile your code without interfering with its performance characteristics. This is nice, because you don't end up optimizing the wrong things. But I digress.

Anyway, the profiler showed that we were spending a LOT of time in Logger.log(), on lines that look fairly harmless, but aren't:

public void log(LogRecord record) {
if (record.getLevel().intValue() < levelValue || levelValue == offValue) {
return;
}
synchronized (this) {
if (filter != null && !filter.isLoggable(record)) {
return;
}
}

// Post the LogRecord to all our Handlers, and then to
// our parents' handlers, all the way up the tree.

Logger logger = this;
while (logger != null) {
Handler targets[] = logger.getHandlers();

if (targets != null) {
for (int i = 0; i < targets.length; i++) {
targets[i].publish(record);
}
}

if (!logger.getUseParentHandlers()) {
break;
}

logger = logger.getParent();
}
}
There are no fewer than four locks being acquired in this method. There is the obvious call to synchronized(this). logger.getHandlers() is also a synchronized method. getUseParentHandlers() is synchronized. getParent() is also synchronized. Acquiring and releasing these locks was killing our throughput!

It turns out that there are some very simple things you can do to eliminate the locks in this code without exposing yourself to any correctness issues:

  1. The filter field is protected by the lock on this. That lock is really only protecting one field; the lock is held while one line of code writes to it, and one line of code reads from it. If you want to make something that does the same thing (from the perspective of concurrency), you can make that field into a volatile. If you are worried about the filter variable changing in between when you read it and when you write it, you can read it to a local variable first:
    Filter theFilter = filter;  // filter is volatile
    if (theFilter != null && !theFilter.isLoggable(record)) {
    return;
    }
  2. There is a global lock protecting the handlers. It turns out that the only thing this is protecting is a rarely-updated array of Handlers. If you have a rarely updated concurrent array, you should be using the non-blocking CopyOnWriteArrayList instead.

  3. getUseParentHandlers() is synchronized for the exact same reason as the filter, and can be replaced with a volatile in the same way.


These are all pretty simple improvements, but they doubled my throughput. The changes are going to be incorporated into JDK7, and are, in fact, already in the downloads you can get from OpenJDK.

I should rush to say that I don't blame the original author for not making these changes; java.util.logger was added in JDK 1.4, before the addition of java.util.concurrent and before a rigorous definition of volatile. Plus, it really isn't designed for throughput.

Why am I blogging about pretty simple improvements? There are a few simple morals here:

  1. Learn your libraries. Don't have a synchronized array that is almost never updated, for example.

  2. Know when you can use tricky language features. Knowing that volatile can be used for simple, single-variable updates is a very useful thing to know. I've written about volatile frequently, go read those posts.

  3. It is actually worth it to participate in OpenJDK. We (or I, at least) have a tendency to assume that JDK code is really high quality and well-optimized. This is probably true in a lot of areas (java.util.concurrent, or java.lang.MostStuff, or java.util.YourFavoriteDataStructure), but there is still plenty of work to be done. If you get involved, you not only help yourself, but you also help everyone.

  4. I don't know, low-hanging fruit == good?

There is, by the way, a very large section of the definitely-recommended book Java Concurrency in Practice (lgt amazon) devoted to clever things you can do to speed up your multithreaded logging.

19 comments:

Alter Ego said...

This is a good illustration indeed. I just wonder what would you use if you had to log a couple of megabytes a second?

Monis Iqbal said...

That's the thing I like about Open source. Even if you aren't contributing (like myself) then by just going through the code (which in the most cases is nicely written) you could figure out the problems you could run into.

Sometimes it is because of the then-followed-standards as you have mentioned in your post, sometimes it's the committer's oversight or it could even be the generalized version of the API that's hurting your cause.
I have run into the latter situation a couple of times and just included my version of the functionality in the same package.

Jeremy Manson said...

@alter - That's a great topic for a blog post for another occasion. I wouldn't use java.util.logging, because that is really designed for writing little strings of text, primarily to the console. I would also do something a little more sophisticated in terms of threading.

@monis - Yep!

Walter said...

The performance killer in our logging was the SimpleFormatter:

public synchronized String format(LogRecord record)

Duplicating the code, removing the synchronized and using:

static ThreadLocal<SimpleDateFormat> sdFormat = new ThreadLocal<SimpleDateFormat>() {
@Override
protected synchronized SimpleDateFormat initialValue() {
return new SimpleDateFormat("yyyy.MM.dd-HH:mm:ss.SSS");
}
};

public String format(LogRecord record) {
StringBuffer sb = new StringBuffer(200);

Date dat = new Date(record.getMillis());
String text = sdFormat.get().format(dat);
sb.append(text);
...

made a huge difference!

Arnd said...

@Walter
wouldn't it be better to use the StringBuilder instead of the StringBuffer?!

Stanimir Simeonoff said...

The real issue is not that log(LogRecord) but the creation of LogRecord, look at the source code (constructor) of the LogRecord which has static sync.

synchronized (LogRecord.class) {
sequenceNumber = globalSequenceNumber++;
Integer id = (Integer)threadIds.get();
if (id == null) {
id = new Integer(nextThreadId++);
threadIds.set(id);
}
threadID = id.intValue();
}

Fixing that by adding a bootstrap LogRecord is a quite big step.

Jeremy Manson said...

@walter - I was thinking about looking at that class, next. It is ridiculous that it isn't immutable. And don't use StringBuffer!. :)

@stanimir - That's *an* issue, but not *the only* issue. Interestingly, that code could be removed if LogRecord could use the Thread's actual ID (via Thread.getId()), but unfortunately, LogRecord thinks thread ids are ints. I may still do something about it (like use the low-order 32 bits) if that turns out to be a bottleneck in our code.

More news: It turns out that getting the caller's method name is also a big cost for us, so I've written a patch to make that faster, too. We'll see how that goes.

Stanimir Simeonoff said...

ThreadID stuff is an official bug
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6278014

The one generated by the LogRecord is close to useless, save for distinction of the threads.

(int) Thread.currentThread().getId() does a trick however most AppServers create threads at will and threadId is increased by every thread created (not even launched, the infamous synchronized(Thread.class){sleep(0)} bans any thread creation...)

However formatters are free to take the id as long the same way (NOT from the passed LogRecord), as long as LogRecord is not serialized (byte[] or passed through queue)and even then it's possible to extend it but that's not the case.

globalSequenceNumber shall use AtomicLong as well.
Now about the bottleneck. While having sync raises contention with every log call it's next to unavoidable since the handler code actually takes the most of the time, i.e. moving the sync to the handler(s) won't be so a great of improvement in case of a major handler that logs most of the stuff.

I use different files (handlers) for different loggers and that's reduces the contention greatly.
Still almost all the logger code can be written lock-free since context switch should be avoided if possible.

Syncs (monitorEnter/exit) in sun's hotspot are optimized for lack of contention and performance drops heavily under, even locks would have been better there... As far as I recall some VM will experience much better performance even w/ unmodified logging package but that's beyond the scope.

Inferring the caller of course is expensive but it's done on demand, i.e. not logging caller method/class (or providing them manually, huh) does improve the performance.

Generally I resort to a modified (optimized) logging package in the bootstrap path when deploying on a server.

(I'm sorry for the too long post...)

Stanimir Simeonoff said...

I've decided to give it a try and optimize Logger class to virtually avoid any locking during log(...) (incl. darned getParent()).

Hopefully, I can find some time and finish it shortly.

Stanimir Simeonoff said...

Here it is a virtually no lock (during logging) implementation of
Logger.java.

If I find time and inspiration I will write an article about. If anyone's interested on details, please, drop a message. Basically it involves a volatile read to replace handlers/filter locking part and Read/Write lock to replace treeLock.

@Jeremy
Can you, please, give it a try benchmark wise. I'd like to know if there is any improvement now.

@Anyone
If anyone decides to use it (moving to bootstrap or whatever) - the code comes w/ absolutely no warranty of any kind whatsoever. I just say: it works for me.

Stanimir

Jeremy Manson said...

@stanimir - I'm not sure why you think my code locks during getParent. It doesn't, and the relevant changes are a lot smaller than yours. Here's the patch again:

http://hg.openjdk.java.net/jdk7/tl/jdk/rev/6eac3829cb41

Note that getParent() doesn't actually need the lock - the difference over volatile isn't actually providing any useful semantics.

Stanimir Simeonoff said...
This comment has been removed by the author.
mutleythedog said...

Holy smoke!

mutleythedog said...

Holy smoke!

Anonymous said...

Jeremy: just wanted to say great work! Keep it up.

Andy Malakov said...

Great post!

Speaking about logging.

Wouldn't it be great to have

Logger.log (Level level, String msg, Object ... varags) in addition to
existing Logger.log (Level level, String msg, Object [] varags) ?

So that one can use

logger.log (Level.FINEST, "User {0} submitted {1}", user, request) ?

Sorry for offtopic, but where to start to facilitate such enhancement?

Jeremy Manson said...

@Andy - I recommend suggesting it to OpenJDK. The correct mailing list is probably here:

http://mail.openjdk.java.net/mailman/listinfo/core-libs-dev

Google has a class called FormattingLogger internally; I suspect that there are a plethora of such classes.

josh said...

Can you offer an guidance to those who want to patch 1.6 with this change? Great post, thanks for writing this up.

Jeremy Manson said...

@josh - I think it is probably one of those things where if you don't know how to do it, you shouldn't. Having said that, -Xbootclasspath/p: is your friend.