Today, I'll discuss a little tweak I made to
So, "I" have an application that is running a couple of hundred threads on an 8-core machine, and it wants to log about 2MB a second using
When I came to this code, it was already doing sensible things like buffering its output. It wasn't doing something more complicated, like using a dedicated logging thread. It could log about 1MB a second, and was chewing through CPU pretty rapidly. I just decided to run our profiler on the code and see where the bottlenecks were. Our profiler is based on the undocumented AsyncGetCallTrace profiling call in HotSpot, and can actually profile your code without interfering with its performance characteristics. This is nice, because you don't end up optimizing the wrong things. But I digress.
Anyway, the profiler showed that we were spending a LOT of time in
It turns out that there are some very simple things you can do to eliminate the locks in this code without exposing yourself to any correctness issues:
These are all pretty simple improvements, but they doubled my throughput. The changes are going to be incorporated into JDK7, and are, in fact, already in the downloads you can get from OpenJDK.
I should rush to say that I don't blame the original author for not making these changes; java.util.logger was added in JDK 1.4, before the addition of
Why am I blogging about pretty simple improvements? There are a few simple morals here:
java.util.Logging
that made my logging throughput double. I want to use it as an illustration that it often isn't very difficult to improve the performance of concurrent code by doing things that are actually pretty easy to do.So, "I" have an application that is running a couple of hundred threads on an 8-core machine, and it wants to log about 2MB a second using
java.util.Logger
. When I say "I have", I actually mean "someone else has", because if "I" had to log a megabyte a second, there is no way I would use java.util.Logger
to do it. Still, we all make our choices.When I came to this code, it was already doing sensible things like buffering its output. It wasn't doing something more complicated, like using a dedicated logging thread. It could log about 1MB a second, and was chewing through CPU pretty rapidly. I just decided to run our profiler on the code and see where the bottlenecks were. Our profiler is based on the undocumented AsyncGetCallTrace profiling call in HotSpot, and can actually profile your code without interfering with its performance characteristics. This is nice, because you don't end up optimizing the wrong things. But I digress.
Anyway, the profiler showed that we were spending a LOT of time in
Logger.log()
, on lines that look fairly harmless, but aren't:
public void log(LogRecord record) {
if (record.getLevel().intValue() < levelValue || levelValue == offValue) {
return;
}
synchronized (this) {
if (filter != null && !filter.isLoggable(record)) {
return;
}
}
// Post the LogRecord to all our Handlers, and then to
// our parents' handlers, all the way up the tree.
Logger logger = this;
while (logger != null) {
Handler targets[] = logger.getHandlers();
if (targets != null) {
for (int i = 0; i < targets.length; i++) {
targets[i].publish(record);
}
}
if (!logger.getUseParentHandlers()) {
break;
}
logger = logger.getParent();
}
}
There are no fewer than four locks being acquired in this method. There is the obvious call to synchronized(this)
. logger.getHandlers()
is also a synchronized method. getUseParentHandlers()
is synchronized. getParent()
is also synchronized. Acquiring and releasing these locks was killing our throughput!It turns out that there are some very simple things you can do to eliminate the locks in this code without exposing yourself to any correctness issues:
- The
filter
field is protected by the lock onthis
. That lock is really only protecting one field; the lock is held while one line of code writes to it, and one line of code reads from it. If you want to make something that does the same thing (from the perspective of concurrency), you can make that field into a volatile. If you are worried about the filter variable changing in between when you read it and when you write it, you can read it to a local variable first:Filter theFilter = filter; // filter is volatile
if (theFilter != null && !theFilter.isLoggable(record)) {
return;
} - There is a global lock protecting the handlers. It turns out that the only thing this is protecting is a rarely-updated array of
Handlers
. If you have a rarely updated concurrent array, you should be using the non-blockingCopyOnWriteArrayList
instead. getUseParentHandlers()
is synchronized for the exact same reason as the filter, and can be replaced with a volatile in the same way.
These are all pretty simple improvements, but they doubled my throughput. The changes are going to be incorporated into JDK7, and are, in fact, already in the downloads you can get from OpenJDK.
I should rush to say that I don't blame the original author for not making these changes; java.util.logger was added in JDK 1.4, before the addition of
java.util.concurrent
and before a rigorous definition of volatile. Plus, it really isn't designed for throughput.Why am I blogging about pretty simple improvements? There are a few simple morals here:
- Learn your libraries. Don't have a synchronized array that is almost never updated, for example.
- Know when you can use tricky language features. Knowing that volatile can be used for simple, single-variable updates is a very useful thing to know. I've written about volatile frequently, go read those posts.
- It is actually worth it to participate in OpenJDK. We (or I, at least) have a tendency to assume that JDK code is really high quality and well-optimized. This is probably true in a lot of areas (
java.util.concurrent
, orjava.lang.MostStuff
, orjava.util.YourFavoriteDataStructure
), but there is still plenty of work to be done. If you get involved, you not only help yourself, but you also help everyone. - I don't know, low-hanging fruit == good?
Comments
Sometimes it is because of the then-followed-standards as you have mentioned in your post, sometimes it's the committer's oversight or it could even be the generalized version of the API that's hurting your cause.
I have run into the latter situation a couple of times and just included my version of the functionality in the same package.
@monis - Yep!
public synchronized String format(LogRecord record)
Duplicating the code, removing the synchronized and using:
static ThreadLocal<SimpleDateFormat> sdFormat = new ThreadLocal<SimpleDateFormat>() {
@Override
protected synchronized SimpleDateFormat initialValue() {
return new SimpleDateFormat("yyyy.MM.dd-HH:mm:ss.SSS");
}
};
public String format(LogRecord record) {
StringBuffer sb = new StringBuffer(200);
Date dat = new Date(record.getMillis());
String text = sdFormat.get().format(dat);
sb.append(text);
...
made a huge difference!
wouldn't it be better to use the StringBuilder instead of the StringBuffer?!
synchronized (LogRecord.class) {
sequenceNumber = globalSequenceNumber++;
Integer id = (Integer)threadIds.get();
if (id == null) {
id = new Integer(nextThreadId++);
threadIds.set(id);
}
threadID = id.intValue();
}
Fixing that by adding a bootstrap LogRecord is a quite big step.
@stanimir - That's *an* issue, but not *the only* issue. Interestingly, that code could be removed if LogRecord could use the Thread's actual ID (via Thread.getId()), but unfortunately, LogRecord thinks thread ids are ints. I may still do something about it (like use the low-order 32 bits) if that turns out to be a bottleneck in our code.
More news: It turns out that getting the caller's method name is also a big cost for us, so I've written a patch to make that faster, too. We'll see how that goes.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6278014
The one generated by the LogRecord is close to useless, save for distinction of the threads.
(int) Thread.currentThread().getId() does a trick however most AppServers create threads at will and threadId is increased by every thread created (not even launched, the infamous synchronized(Thread.class){sleep(0)} bans any thread creation...)
However formatters are free to take the id as long the same way (NOT from the passed LogRecord), as long as LogRecord is not serialized (byte[] or passed through queue)and even then it's possible to extend it but that's not the case.
globalSequenceNumber shall use AtomicLong as well.
Now about the bottleneck. While having sync raises contention with every log call it's next to unavoidable since the handler code actually takes the most of the time, i.e. moving the sync to the handler(s) won't be so a great of improvement in case of a major handler that logs most of the stuff.
I use different files (handlers) for different loggers and that's reduces the contention greatly.
Still almost all the logger code can be written lock-free since context switch should be avoided if possible.
Syncs (monitorEnter/exit) in sun's hotspot are optimized for lack of contention and performance drops heavily under, even locks would have been better there... As far as I recall some VM will experience much better performance even w/ unmodified logging package but that's beyond the scope.
Inferring the caller of course is expensive but it's done on demand, i.e. not logging caller method/class (or providing them manually, huh) does improve the performance.
Generally I resort to a modified (optimized) logging package in the bootstrap path when deploying on a server.
(I'm sorry for the too long post...)
Hopefully, I can find some time and finish it shortly.
Logger.java.
If I find time and inspiration I will write an article about. If anyone's interested on details, please, drop a message. Basically it involves a volatile read to replace handlers/filter locking part and Read/Write lock to replace treeLock.
@Jeremy
Can you, please, give it a try benchmark wise. I'd like to know if there is any improvement now.
@Anyone
If anyone decides to use it (moving to bootstrap or whatever) - the code comes w/ absolutely no warranty of any kind whatsoever. I just say: it works for me.
Stanimir
http://hg.openjdk.java.net/jdk7/tl/jdk/rev/6eac3829cb41
Note that getParent() doesn't actually need the lock - the difference over volatile isn't actually providing any useful semantics.
Speaking about logging.
Wouldn't it be great to have
Logger.log (Level level, String msg, Object ... varags) in addition to
existing Logger.log (Level level, String msg, Object [] varags) ?
So that one can use
logger.log (Level.FINEST, "User {0} submitted {1}", user, request) ?
Sorry for offtopic, but where to start to facilitate such enhancement?
http://mail.openjdk.java.net/mailman/listinfo/core-libs-dev
Google has a class called FormattingLogger internally; I suspect that there are a plethora of such classes.