Sunday, August 24, 2008

Don't Use StringBuffer!

One thing I have noticed among Java API users is that some don't seem to know that there is a difference between StringBuffer and StringBuilder. There is: StringBuffer is synchronized, and StringBuilder isn't. (The same is true for Vector and ArrayList, as well as Hashtable and HashMap). StringBuilder avoids extra locking operations, and therefore you should use it where possible. That's not the meat of this post, though.

Among those who know that StringBuffer is synchronized, they sometimes think that it has magical thread-safety properties. Here's a simplified version of some code I wrote recently

final StringBuilder sb = new StringBuilder();
Runnable runner = new Runnable() {
@Override public void run() {
sb.append("1");
}
};
Thread t = new Thread(runner);
t.start();
try { t.join() } catch (InterruptedException e) {}
// use sb.
A fellow looked at this code, and said that he thought I should use StringBuffer, because it is safer with multiple threads. Well, no. Using StringBuffer here would actually be bad coding practice.

The issue here is that it isn't actually thread-safe to use StringBuffer. It is just pretend thread-safe. Yes, that's an official term. It is completely useless in practice to have multiple threads pounding on a StringBuffer, if at least one of them is mutating it, because you then can't make any guarantees about the StringBuffer's contents. Let's see what would happen if I added another thread to the example we were examining, and replaced the Builder with a Buffer:

Thread 1:
sb.append("a");

Thread 2:
sb.append("b");

Thread 3:
join 1,2
print(sb.toString());
Sure, it is "thread-safe", in the sense that there are no data races (which are basically concurrent accesses without adequate synchronization). But you have no idea what thread 3 will print: "ab" or "ba". I would have to introduce more synchronization to make this produce a sensible result. The locks that come with the StringBuffer haven't helped anything.

What I did in that original code was a perfectly legitimate handoff, and a widely used Java idiom. One thread hands off an object to another thread in a properly synchronized way, and then stops using it. The second thread uses it for a while, stops using it, and hands it back to the first.

Using Java memory model words: The initialization of the StringBuilder happens-before the spawn of the thread, which happens-before the append operation, which happens-before the join. Thread starts and joins induce happens-before edges, which is something that several people have managed to miss recently.

The additional locking provided by the StringBuffer class is completely superfluous: the guarantees are provided entirely by the Thread.start and Thread.join invocations. Using StringBuffer is even harmful, in the sense that you need to perform a whole bunch of (relatively) expensive synchronization, and prevent the compiler from aggressively optimizing the code.

To go back to the original point, I would even argue that using the StringBuffer class would be bad coding style: it would imply that that extra locking is providing something of benefit in the presence of multiple threads, which would be misleading.

As it turns out, I have yet to see a practical use case for the StringBuffer class over the StringBuilder class. A quick poll of Josh Bloch (who also works at Google), indicates that he has yet to see one, too. There are similar problems in some of the synchronized OutputStream classes in Java, too -- who writes to an OutputStream in multiple threads?

In short: a locked version of an API isn't not automatically better than an unlocked one just because you happen to be using it in multiple threads.

I've often thought that a short discussion of this should be added to the StringBuffer JavaDoc, since everyone who maintains that JavaDoc believes this, too.


A final note. Some people think that there is no performance impact to using StringBuffer instead of StringBuilder, because of all of the clever things JVMs do to eliminate synchronization (I can blog about this, too, if you want). This is a little unclear. Whether it can even perform these optimizing transformations definitely depends wildly on which JDK you use; I wrote a microbenchmark to test it, and the results differed on different JDKs -- but all that microbenchmarks really demonstrate is the performance of the microbenchmark.


ETA One of the comments points out that javac often replaces Strings with StringBuilders when constructing them. This is correct, and this is because javac realizes that when you are performing changes to the object, it makes more sense to using StringBuilders than Strings. If you have code like this:
while (guard) {
string += value;
}
You are almost always better off writing that like this:
while (guard) {
stringBuilder.append(value);
}
string = stringBuilder.toString();
The first way will allocate a new String object on every iteration, and copy the values of the previous String object. The second way will allocate a single array and append the characters to it. I have improved the performance of some pieces of code several orders of magnitude by switching from the first form to the second. Don't use String concatenation in a loop!

39 comments:

Buddy Casino said...

On the SUN Java 6 JVM at least, it hardly matters since it optimizes away the synchronization, thats true. But I'd be interested in up-to-date JVMs that don't. All I ever hear when people talk about various topics regarding Java VM implementations is "while some do this, others don't".

This is also a good opportunity to mourn the often missing documentation regarding thread-safety in the Java APIs and JSR specifications.

Btw., this StringBuffer problem is also explained in the "Sun Certified Programmer for Java 5" study guide, which I can recommend wholeheartedly.

Anonymous said...

Please blog about synchrnozation. It would be useful post.

Eric Wendelin said...

If you ever use the javap tool, you'll see that Strings are often converted to StringBuilders at compile-time as well.

It turns out that simplest solution, String, is sometimes the best. Of course if you add threading then of course your approach makes sense.

dacracot said...

Thanks very much for the advise. My application, http://tox.sourceforge.net/ is now approximately 5% faster.

- dacracot

Jeremy Manson said...

@buddy -- It is hard to say whether Sun's JDK6 actually optimizes away anything. Don't forget that you are at the mercy of whether the system thinks you are in a hotspot, and if you are using StringBuffer all over the place, it could place a burden on your system without actually being identified as a hotspot.

That being said, my microbenchmark had more expensive StringBuffers than StringBuilders in 6u1. In 6u6, they were about the same. You get different results if you enable different optimizations. So it varies wildly.

@anon -- I'll add synchronization elimination techniques to the list of things about which I should blog.

@eric -- Yep! StringBuilders make far more sense than doing something crazy like using String concatenation operations, and javac frequently knows this. Still, this code:

while (guard) {
  string += someVal;
}

Can be a *lot* more expensive than:

while (guard) {
  stringBuffer.append(someVal);
}

@dacracot Glad to hear it!

Anonymous said...

The initial size on StringBuffer/StringBuilder makes more difference. If the resulting string will be several thousands of characters, it's worth to allocate the initial buffer to the expected size, example: StringBuffer sb=new StringBuffer(10000);.

burtonator said...

I'm amazed more people don't know about this.

Back in the day I wrote a FastStringBuffer that was had no synchronization.

The idea was borrowed from xerces... they had benchmarked StringBuffer and found it to be performing useless syncrhonization.

BTW.... the setLength method in StringBuilder is good for performance too.

This way you can avoid creating new buffers each time and then resizing them.

Just call setLength(0) and then start writing again.

Server Performance said...

Offtopic, but also note that the inner implementation of StringBuilder/StringBuffer in the JDK isn't good.

Whenever you append or insert, it makes System.arraycopy(), and when you finally convert it toString() it makes another one!

It would be a much better approach mantaining internally a linked list or a tree, and constructing the actual String only once, when you call toString().

Maybe you know about Javolution (http://www.javolution.org). It has a Text class which claims to perform insertion/deletion in O[Log(n)] instead of O[n] for standard StringBuffer/StringBuilder. I thing internally is devolped somehow like what I say.

Can you imagine the "automatic boost" a change like this could represent in all our software?

P.S: Great blog! I have learned learn much from it.

Server Performance said...

By the way, your post Title is "Don't Use StringBuffer!". The title of my comment could be:

"Nor Use StringBuilder!"

:-)

Blaisorblade said...

What about buffers for log messages, shared among multiple threads? In that case, the thread-safety guarantee of StringBuffer can be useful.

@server performance: for little strings (the size must be determined by benchmarking, of course) copying will be probably more efficient than using a linked list, which suffers from much worse spatial locality and is thus less cache-friendly.
I think a linked list may get more efficient at around the size of the switchover may be around the cacheline size (say 32 or 64 bytes), but this may be completely wrong.
For the O(log n) solution, one has to see the constant factors - 10^6 log n is rarely faster than N.

Instead of using a linked list or a tree, a better solution would be a convertToString method which hands off the character array to the String and clears the StringBuilder by creating a new empty char array. This is the idea of the move constructors, which are being introduced into the C++ language.

Server Performance said...

@Blaisorblade: of course you are right, I didn't enter in details in my comment. A smart implementation should decide which append/insert technique use depending on sizes and other things. Even mixed solutions.

But I must say... I like your "move constructor" solutions!

What about putting it all together? Someday I will try to hack it and try...

Anonymous said...

you should have mentioned that the fellow in question had been spending way too much time doing down-in-the-trenches memory-barrier-wielding nanosecond-counting downright-dirty c++ programming, to trust the neat linear handoff embodied in the code ...

Fellow.

Jeremy Manson said...

@Fellow -- This is actually a blog post I wrote many months ago, and had been sitting, unfinished, since then. I was only reminded to post it by that experience. I wouldn't expect anyone who wasn't a day-to-day Java user with a fair bit of multithreading knowledge to know this fairly obscure bit of trivia.

Anonymous said...

The reason StringBuffer is still in wide use is that StringBuilder was only introduced in java 5. But your are right and ignorance is not a bless.

Anonymous said...

"who writes to an OutputStream in multiple threads?"

I do. I have multiple threads reading data, does some processing and writes it back to file.

Jeremy Manson said...

@Anonymous - I do. I have multiple threads reading data, does some processing and writes it back to file.

Do you use the built in, single-write-operation-at-a-time, synchronization, or do you wrap it and use your own synchronization?

Server Performance said...

Sorry my off-topic again, but I think it will become on-topic:

Investigating about the "move constructor" that Blaisorblade mentioned, I have realized that the old 1.4 version of StringBuffer has a very implementation of a "copy-on-write" (COW), as C++ String and as .NET StringBuilder I think. In few words: whe you call the toString() method, it doesn't copy the char array, but shares it with the String and marks it inside; and if there are future call to insert or remove or others in the "inmutable" section of the char array, then creates a copy of the array and goes on.

For some reason, this changed in JDK 1.5, becoming in a "always copy" strategy. Why? Haven't find it. I have only found the forum entry dated in 2005: http://forums.java.net/jive/thread.jspa?messageID=29032&#29032.

I don't understand what kind of concurrency problem that the COW approximation has. Some clue Jeremy?

For testing, I have hacked AbstractBuilder, StringBuilder and StringBuffer with the COW hack. And in a microbenchmark there is a significative boost in both cases, also NetBeans, Glassfish, my J2EE apps and so on worked great with this patch. Sorry but no deep multithreading testing...

I will post it in the "General performance Forum" (http://forums.java.net/jive/forum.jspa?forumID=60), but what do you think?

Regards!

Jeremy Manson said...

I don't understand what kind of concurrency problem that the COW approximation has. Some clue Jeremy?

I don't precisely remember this from back then (although it sounds familiar). It sounds as if, in order to make it work, that other implementation would require synchronization inside the String class beyond just marking the byte array as final. I imagine that this would slow down all accesses to Strings rather substantially, which is far worse than having a slow StringBuffer.toString() method.

Server Performance said...

would require synchronization inside the String class

But... is String inmutable? why synchronize its contents?

Jeremy Manson said...

But... is String inmutable? why synchronize its contents?

I actually misunderstood your question. I thought you meant that you would switch the String out when the StringBuffer changed. You meant you would copy the StringBuffer when the String changed.

My guess is that they did this so that they could share an implementation between StringBuffer and StringBuilder, since you would need synchronization for the copy on write strategy to work. They probably also decided that having a separate code base that they had to maintain in order to keep the StringBuffer code optimized was a waste of time, when you consider the overhead that using StringBuffer has just because of the synchronization.

Server Performance said...

Please sorry. My English is horrible.

What I tried to mean (for both Builder and Buffer):

1. toString() shares the buffer with the new String:

[@AbstractStringBuilder.java]
public /*abstract*/ String toString() {
shared = true;
// Constructor sharing the value!
return new String(0, count, value);
}

2. After it, if the insert() or remove() or whatever other method that is going to change the contents of the shared portion of the buffer... then first of all clones the SB buffer, leaving the String buffer as is: the String value keeps immutable and the Buffer value mutes. For instance:

[@AbstractStringBuilder.java]
public AbstractStringBuilder insert(int offset, char str[]) {

(...)
if (newCount > value.length)
expandCapacity(newCount);
else if (shared)
copy();

(...)

And the copy method:

[@AbstractStringBuilder.java]
private void copy() {
char[] newValue = new char[value.length];
System.arraycopy(value, 0, newValue, 0, count);
value = newValue;
shared = false;
}

There are of course other considerations about synchronization in the StringBuffer overrided toString() method, and the String(StringBuffer) constructor.

But I can't see the problem. But sure it is there, because it would be a boost for StringBuilder (doesn't mind StringBuffer opts, actually) and JDK designers are much better than me, of course!

Sorry again for mi English :-(

Jeremy Manson said...

@server -- If the String and the StringBuffer are using the same backing array, this may cause the String to appear to mutate if one thread calls toString() while another is between the read of shared and the insert operation.

You never want to have a situation where you can make a String appear to mutate.

Server Performance said...

Jeremy, sorry but I don't agree:

(*) StringBuffer wouldn't have the mentioned problem, correct?

(*) StringBuilder explicitly is thread-insafe (from its javadoc: This class provides an API compatible with StringBuffer, but with no guarantee of synchronization. This class is designed for use as a drop-in replacement for StringBuffer in places where the string buffer was being used by a single thread (as is generally the case).

And, as far as I know System.arraycopy() isn't threadsafe, is it? So the current "always-copy" StringBuilder implementation can have exactly the same mutability problem you mentioned if inserting during an arraycopy operation, doesn't it?

And if you misuse StringBuilder's muting operators in a multithreaded environment... the problem is already there since System.arraycopy isn't safe.

Server Performance said...

Seems to be more a memory consumption problem than a synchronization one. I have suggested in using the COW as an option for -XX:+AggressiveOpts (as String cache is)..

Can see it on: http://forums.java.net/jive/thread.jspa?threadID=46620

Mattchew said...

I tell you what. I'm done with StringBuffer; not only does it talk about me behind my back, but I'm pretty sure it's anti-Semitic.

AJ said...

I have two threads appending to the same StringBuffer object.
say..
Thread 1 : sb.append("12345")
Thread 2 : sb.append("asdfg")
i would get either 12345asdfg or the other way around but never jumbled coz the method is synchronized.
But if this was done using StringBuilder this may produce a jumbled string say 12a3sd45..
both StringBuffer and StringBuilder use a char array to store the values.
I guess this is where lies difference between a Stringbuffer and builder. please correct me if i am wrong...

Jeremy Manson said...

@AJ: That's pretty close to being right. The StringBuilder might get way more corrupted than that -- possibly even unreadable.

Now, the question is -- is there a use for that? Someone suggested multithreaded logging, but I can think of way better ways of doing that.

Anonymous said...

I agree with AJ. Thread example: "But you have no idea what thread 3 will print: "ab" or "ba"." Yes, but you want "ab" or "ba" no matter which, not "a" or "b" that's why you need synchronization. And "OutputStream classes in Java, too -- who writes to an OutputStream in multiple threads?" - There are for example the log files of a web or a gui application would be a mess without synchronization.

Jeremy Manson said...

@Anonymous -- StringBuffer is a terrible logging class for all sorts of reasons. You'd be better off using a Collections.synchronizedList of Strings. For example, with a StringBuffer, you don't have any obvious way of separating the individual log messages from each other. The list approach would have individual elements for each log entry. If you use a StringBuffer, the logging will keep going until you fill up memory, so you really need some form of log rotation. How are you going to do that? Chop off the beginning of the buffer? That will result in the implementation's copying the remaining buffer to a new buffer. With a List, you can just chop off the front -- much easier, and again, you don't have to keep track of the boundary between the log messages.

Also, StringBuffer doesn't have levels, so good luck if you only want to look at the more important warnings.

As for OutputStreams, well, that's a complete non-starter. If they have synchronization, it is at the byte level. That's completely non-useful for logging, unless your messages are only one byte long. Also, not all of the implementations have locking. Guess which ones do and which ones don't -- fun for the whole family!

Finally, if you are using a StringBuffer or an OutputStream for your logging needs without encapsulating it in a logging class, may I suggest that there is a problem with the design of your code? If you ultimately change your mind and decide that a StringBuffer isn't going to meet your needs, then you are going to have a hell of a time extracting it. At the very least, you should be using a wrapper class, and you should probably be using java.util.logging or log4j.

Majo said...

You should stress that the code results in better performance in case of lot of work with long strings.
Performance when concatenating strings is one thing, best practice for a lot of applications may be using String just because of its immutability which is the best way to help any garbage collector. Using immutable objects in general is also the best intuitive solution when dealing with concurrency and sharing data among threads.
For the rests of the article,, nice one and thanks

Jeremy Manson said...

@Majo - You're welcome! And I go into the immutable properties of Strings in lots of other posts.

IIT-Global-Jobs said...

Hi Jeremy,

I have a very simple example for you. So it goes like this-

final StringBuilder sb = new StringBuilder();

Thread 1-
sb.append("123");
Thread 2-
sb.append("456");

It may happen that after execution of both the threads sb comes out to be either 123 or 456. However as per the article it should either be 123456 or 456123. The reason for this is that AbstractStringBuilder.append has a method call str.getChars(0, len, value, count); If one thread reaches there and then 2nd thread comes afterwards. Now the first thread proceeds writes 123 and then the second thread proceeds overwrites 123 by 456 which actually corrupts the StringBuilder.

Jeremy Manson said...

@jobs - No, I never claimed that StringBuilder was thread-safe. You should use StringBuffer if you want that kind of thread safety. The fact of the matter, however, is that you pretty much never want to write something like that.

Saravanan said...

Hi,

Can you please show me a complete sample program to understand the difference betweeen StringBuffer and StringBuilder? I am theoritically able to understand what you are mentioning, but not in practice.

I would highly appreciate your time and help.

Thanks,
TP

Jeremy Manson said...

Hi,

The difference is that all of the methods on StringBuffer are marked synchronized, and none of the methods on StringBuilder are.

Jeremy

Saravanan said...

Hi,

You have mentioned that: "But you have no idea what thread 3 will print: "ab" or "ba". I would have to introduce more synchronization to make this produce a sensible result."

Please find my sample program below. It produces 12 or 21 as the output and that, I think, is because of JVM prioritizing thread 1 or 2 according to its wish. Is there a way to always get a consistent output, using threads?

Also, I see 1 as the output sometimes. This is even more confusing as to what in the world will give a consistent result. In other words, what is that "more synchronization" that you have earlier mentioned should be done to this program?

I think I am reiterating a same question again and again.

class StringBuilderTP implements Runnable{

StringBuffer sb = new StringBuffer();
// StringBuilder sb = new StringBuilder();
public synchronized void run(){
//synchronized(this){
if(Thread.currentThread().getName().equals("T1")){
System.out.println("T1");
changeone();
}else if(Thread.currentThread().getName().equals("T2")){
System.out.println("T2");
changetwo();
}
//}
}
public static void main(String[] args) {
StringBuilderTP tps = new StringBuilderTP();
Thread t1 = new Thread(tps);
Thread t2 = new Thread(tps);

t1.setName("T1");
t2.setName("T2");
t1.start();
t2.start();
try{
//Thread.sleep(5000);
t1.join();
}catch(InterruptedException e){
}
tps.result();
}
public void changeone(){
sb.append("1");

}
public void changetwo(){
sb.append("2");

}
public void result(){

System.out.println("value of sb is"+Thread.currentThread().getName()+ sb);
}


}


Thanks,
TP

Jeremy Manson said...

@Saravanan - That program is very strange, in that you are using thread names to communicate between threads. The fact that run() is synchronized basically won't accomplish anything.

Yes, it is possible to get consistent results. You basically need to introduce code that makes one thread wait to do its update until the other thread is done. You can do this using any of a large variety of synchronization mechanisms, such as CountDownLatches or wait() / notify().

Stanimir Simeonoff said...

ON performance:

StringBuffer used NOT to copy the char[] when creating the String (in non shared variant); however that was a major source of issues, incl leaking huge char[] for small Strings. In 1.5 they decided that a copy of the char[] must occur every time and that practically made StringBuffer useless (the sync was there to ensure no thread games can trick out the String). That conserves memory, though and ultimately helps the GC (beside the obviously reduced footprint), usually the char[] is the top3 of the objects consuming memory.

That was basically the main idea behind the new model: copy each time w/ perfectly resized buffer and let the GC be a star.

Concatenation, however, should have been handled in a totally different manner, ala. String.concat, just w/ more parameters.

My main gripe is that both StringBuider/Buffer take no benefit from being in java.lang. Now they are virtually the same class I had in 1.1 (AsyncStringBuffer that was optimized for longer strings)

the post is part from some longer rant on stackoverflow

TechSY730 said...

I can think of ONE case where the synchronized nature of StringBuffer would be useful.

If you honestly don't care about the order of writes, just that all of them make it in.

For example, imagine a highly parallel Set or Map (both non sorted) implementation. You might try to implement toString in parallel as well. Due to the nature of (non-sorted) Sets and Maps, there are no ordering guarantees. This means toString is really free to spit out the entries in any order.

Thus, you might imagine that you split the work into mutiple threads, sharing a single StringBuffer. Each thread tacks in the entries it is responsible for "toStringing" one at a time. The synchronizedness of StringBuffer ensures that no thread "corrupts" the write of another thread, even though the ordering of each write is not defined. But that is fine, your contract does not assure any sort of ordering. (Then of course, at the end, you remove the trailing comma and append the "[" and "]", but that is really cheap)

Of course, it might be cheaper just to have one StringBuilder for each thread and then concatenate them all in the end.
Or maybe that would be more expensive.
You would have to profile both ways to be sure.


Now of course, these sort of there are a very limited in scope. There aren't many cases where you care everything making it into a string but not caring about the order it gets into a string.

So yea, StringBuffer is mostly worthless, just not completely.