One thing I have noticed among Java API users is that some don't seem to know that there is a difference between StringBuffer and StringBuilder. There is: StringBuffer is synchronized, and StringBuilder isn't. (The same is true for Vector and ArrayList, as well as Hashtable and HashMap). StringBuilder avoids extra locking operations, and therefore you should use it where possible. That's not the meat of this post, though.
Among those who know that StringBuffer is synchronized, they sometimes think that it has magical thread-safety properties. Here's a simplified version of some code I wrote recently
The issue here is that it isn't actually thread-safe to use StringBuffer. It is just pretend thread-safe. Yes, that's an official term. It is completely useless in practice to have multiple threads pounding on a StringBuffer, if at least one of them is mutating it, because you then can't make any guarantees about the StringBuffer's contents. Let's see what would happen if I added another thread to the example we were examining, and replaced the Builder with a Buffer:
What I did in that original code was a perfectly legitimate handoff, and a widely used Java idiom. One thread hands off an object to another thread in a properly synchronized way, and then stops using it. The second thread uses it for a while, stops using it, and hands it back to the first.
Using Java memory model words: The initialization of the StringBuilder happens-before the spawn of the thread, which happens-before the append operation, which happens-before the join. Thread starts and joins induce happens-before edges, which is something that several people have managed to miss recently.
The additional locking provided by the StringBuffer class is completely superfluous: the guarantees are provided entirely by the Thread.start and Thread.join invocations. Using StringBuffer is even harmful, in the sense that you need to perform a whole bunch of (relatively) expensive synchronization, and prevent the compiler from aggressively optimizing the code.
To go back to the original point, I would even argue that using the StringBuffer class would be bad coding style: it would imply that that extra locking is providing something of benefit in the presence of multiple threads, which would be misleading.
As it turns out, I have yet to see a practical use case for the StringBuffer class over the StringBuilder class. A quick poll of Josh Bloch (who also works at Google), indicates that he has yet to see one, too. There are similar problems in some of the synchronized OutputStream classes in Java, too -- who writes to an OutputStream in multiple threads?
In short: a locked version of an API isn't not automatically better than an unlocked one just because you happen to be using it in multiple threads.
I've often thought that a short discussion of this should be added to the StringBuffer JavaDoc, since everyone who maintains that JavaDoc believes this, too.
A final note. Some people think that there is no performance impact to using StringBuffer instead of StringBuilder, because of all of the clever things JVMs do to eliminate synchronization (I can blog about this, too, if you want). This is a little unclear. Whether it can even perform these optimizing transformations definitely depends wildly on which JDK you use; I wrote a microbenchmark to test it, and the results differed on different JDKs -- but all that microbenchmarks really demonstrate is the performance of the microbenchmark.
ETA One of the comments points out that javac often replaces Strings with StringBuilders when constructing them. This is correct, and this is because javac realizes that when you are performing changes to the object, it makes more sense to using StringBuilders than Strings. If you have code like this:
Among those who know that StringBuffer is synchronized, they sometimes think that it has magical thread-safety properties. Here's a simplified version of some code I wrote recently
A fellow looked at this code, and said that he thought I should use StringBuffer, because it is safer with multiple threads. Well, no. Using StringBuffer here would actually be bad coding practice.
final StringBuilder sb = new StringBuilder();
Runnable runner = new Runnable() {
@Override public void run() {
sb.append("1");
}
};
Thread t = new Thread(runner);
t.start();
try { t.join() } catch (InterruptedException e) {}
// use sb.
The issue here is that it isn't actually thread-safe to use StringBuffer. It is just pretend thread-safe. Yes, that's an official term. It is completely useless in practice to have multiple threads pounding on a StringBuffer, if at least one of them is mutating it, because you then can't make any guarantees about the StringBuffer's contents. Let's see what would happen if I added another thread to the example we were examining, and replaced the Builder with a Buffer:
Sure, it is "thread-safe", in the sense that there are no data races (which are basically concurrent accesses without adequate synchronization). But you have no idea what thread 3 will print: "ab" or "ba". I would have to introduce more synchronization to make this produce a sensible result. The locks that come with the StringBuffer haven't helped anything.
Thread 1:
sb.append("a");
Thread 2:
sb.append("b");
Thread 3:
join 1,2
print(sb.toString());
What I did in that original code was a perfectly legitimate handoff, and a widely used Java idiom. One thread hands off an object to another thread in a properly synchronized way, and then stops using it. The second thread uses it for a while, stops using it, and hands it back to the first.
Using Java memory model words: The initialization of the StringBuilder happens-before the spawn of the thread, which happens-before the append operation, which happens-before the join. Thread starts and joins induce happens-before edges, which is something that several people have managed to miss recently.
The additional locking provided by the StringBuffer class is completely superfluous: the guarantees are provided entirely by the Thread.start and Thread.join invocations. Using StringBuffer is even harmful, in the sense that you need to perform a whole bunch of (relatively) expensive synchronization, and prevent the compiler from aggressively optimizing the code.
To go back to the original point, I would even argue that using the StringBuffer class would be bad coding style: it would imply that that extra locking is providing something of benefit in the presence of multiple threads, which would be misleading.
As it turns out, I have yet to see a practical use case for the StringBuffer class over the StringBuilder class. A quick poll of Josh Bloch (who also works at Google), indicates that he has yet to see one, too. There are similar problems in some of the synchronized OutputStream classes in Java, too -- who writes to an OutputStream in multiple threads?
In short: a locked version of an API isn't not automatically better than an unlocked one just because you happen to be using it in multiple threads.
I've often thought that a short discussion of this should be added to the StringBuffer JavaDoc, since everyone who maintains that JavaDoc believes this, too.
A final note. Some people think that there is no performance impact to using StringBuffer instead of StringBuilder, because of all of the clever things JVMs do to eliminate synchronization (I can blog about this, too, if you want). This is a little unclear. Whether it can even perform these optimizing transformations definitely depends wildly on which JDK you use; I wrote a microbenchmark to test it, and the results differed on different JDKs -- but all that microbenchmarks really demonstrate is the performance of the microbenchmark.
ETA One of the comments points out that javac often replaces Strings with StringBuilders when constructing them. This is correct, and this is because javac realizes that when you are performing changes to the object, it makes more sense to using StringBuilders than Strings. If you have code like this:
while (guard) {You are almost always better off writing that like this:
string += value;
}
while (guard) {The first way will allocate a new String object on every iteration, and copy the values of the previous String object. The second way will allocate a single array and append the characters to it. I have improved the performance of some pieces of code several orders of magnitude by switching from the first form to the second. Don't use String concatenation in a loop!
stringBuilder.append(value);
}
string = stringBuilder.toString();
Comments
This is also a good opportunity to mourn the often missing documentation regarding thread-safety in the Java APIs and JSR specifications.
Btw., this StringBuffer problem is also explained in the "Sun Certified Programmer for Java 5" study guide, which I can recommend wholeheartedly.
It turns out that simplest solution, String, is sometimes the best. Of course if you add threading then of course your approach makes sense.
- dacracot
That being said, my microbenchmark had more expensive StringBuffers than StringBuilders in 6u1. In 6u6, they were about the same. You get different results if you enable different optimizations. So it varies wildly.
@anon -- I'll add synchronization elimination techniques to the list of things about which I should blog.
@eric -- Yep! StringBuilders make far more sense than doing something crazy like using String concatenation operations, and javac frequently knows this. Still, this code:
while (guard) {
string += someVal;
}
Can be a *lot* more expensive than:
while (guard) {
stringBuffer.append(someVal);
}
@dacracot Glad to hear it!
Back in the day I wrote a FastStringBuffer that was had no synchronization.
The idea was borrowed from xerces... they had benchmarked StringBuffer and found it to be performing useless syncrhonization.
BTW.... the setLength method in StringBuilder is good for performance too.
This way you can avoid creating new buffers each time and then resizing them.
Just call setLength(0) and then start writing again.
Whenever you append or insert, it makes System.arraycopy(), and when you finally convert it toString() it makes another one!
It would be a much better approach mantaining internally a linked list or a tree, and constructing the actual String only once, when you call toString().
Maybe you know about Javolution (http://www.javolution.org). It has a Text class which claims to perform insertion/deletion in O[Log(n)] instead of O[n] for standard StringBuffer/StringBuilder. I thing internally is devolped somehow like what I say.
Can you imagine the "automatic boost" a change like this could represent in all our software?
P.S: Great blog! I have learned learn much from it.
"Nor Use StringBuilder!"
:-)
@server performance: for little strings (the size must be determined by benchmarking, of course) copying will be probably more efficient than using a linked list, which suffers from much worse spatial locality and is thus less cache-friendly.
I think a linked list may get more efficient at around the size of the switchover may be around the cacheline size (say 32 or 64 bytes), but this may be completely wrong.
For the O(log n) solution, one has to see the constant factors - 10^6 log n is rarely faster than N.
Instead of using a linked list or a tree, a better solution would be a convertToString method which hands off the character array to the String and clears the StringBuilder by creating a new empty char array. This is the idea of the move constructors, which are being introduced into the C++ language.
But I must say... I like your "move constructor" solutions!
What about putting it all together? Someday I will try to hack it and try...
Fellow.
I do. I have multiple threads reading data, does some processing and writes it back to file.
Do you use the built in, single-write-operation-at-a-time, synchronization, or do you wrap it and use your own synchronization?
Investigating about the "move constructor" that Blaisorblade mentioned, I have realized that the old 1.4 version of StringBuffer has a very implementation of a "copy-on-write" (COW), as C++ String and as .NET StringBuilder I think. In few words: whe you call the toString() method, it doesn't copy the char array, but shares it with the String and marks it inside; and if there are future call to insert or remove or others in the "inmutable" section of the char array, then creates a copy of the array and goes on.
For some reason, this changed in JDK 1.5, becoming in a "always copy" strategy. Why? Haven't find it. I have only found the forum entry dated in 2005: http://forums.java.net/jive/thread.jspa?messageID=29032煨.
I don't understand what kind of concurrency problem that the COW approximation has. Some clue Jeremy?
For testing, I have hacked AbstractBuilder, StringBuilder and StringBuffer with the COW hack. And in a microbenchmark there is a significative boost in both cases, also NetBeans, Glassfish, my J2EE apps and so on worked great with this patch. Sorry but no deep multithreading testing...
I will post it in the "General performance Forum" (http://forums.java.net/jive/forum.jspa?forumID=60), but what do you think?
Regards!
I don't precisely remember this from back then (although it sounds familiar). It sounds as if, in order to make it work, that other implementation would require synchronization inside the String class beyond just marking the byte array as final. I imagine that this would slow down all accesses to Strings rather substantially, which is far worse than having a slow StringBuffer.toString() method.
But... is String inmutable? why synchronize its contents?
I actually misunderstood your question. I thought you meant that you would switch the String out when the StringBuffer changed. You meant you would copy the StringBuffer when the String changed.
My guess is that they did this so that they could share an implementation between StringBuffer and StringBuilder, since you would need synchronization for the copy on write strategy to work. They probably also decided that having a separate code base that they had to maintain in order to keep the StringBuffer code optimized was a waste of time, when you consider the overhead that using StringBuffer has just because of the synchronization.
What I tried to mean (for both Builder and Buffer):
1. toString() shares the buffer with the new String:
[@AbstractStringBuilder.java]
public /*abstract*/ String toString() {
shared = true;
// Constructor sharing the value!
return new String(0, count, value);
}
2. After it, if the insert() or remove() or whatever other method that is going to change the contents of the shared portion of the buffer... then first of all clones the SB buffer, leaving the String buffer as is: the String value keeps immutable and the Buffer value mutes. For instance:
[@AbstractStringBuilder.java]
public AbstractStringBuilder insert(int offset, char str[]) {
(...)
if (newCount > value.length)
expandCapacity(newCount);
else if (shared)
copy();
(...)
And the copy method:
[@AbstractStringBuilder.java]
private void copy() {
char[] newValue = new char[value.length];
System.arraycopy(value, 0, newValue, 0, count);
value = newValue;
shared = false;
}
There are of course other considerations about synchronization in the StringBuffer overrided toString() method, and the String(StringBuffer) constructor.
But I can't see the problem. But sure it is there, because it would be a boost for StringBuilder (doesn't mind StringBuffer opts, actually) and JDK designers are much better than me, of course!
Sorry again for mi English :-(
You never want to have a situation where you can make a String appear to mutate.
(*) StringBuffer wouldn't have the mentioned problem, correct?
(*) StringBuilder explicitly is thread-insafe (from its javadoc: This class provides an API compatible with StringBuffer, but with no guarantee of synchronization. This class is designed for use as a drop-in replacement for StringBuffer in places where the string buffer was being used by a single thread (as is generally the case).
And, as far as I know System.arraycopy() isn't threadsafe, is it? So the current "always-copy" StringBuilder implementation can have exactly the same mutability problem you mentioned if inserting during an arraycopy operation, doesn't it?
And if you misuse StringBuilder's muting operators in a multithreaded environment... the problem is already there since System.arraycopy isn't safe.
Can see it on: http://forums.java.net/jive/thread.jspa?threadID=46620
say..
Thread 1 : sb.append("12345")
Thread 2 : sb.append("asdfg")
i would get either 12345asdfg or the other way around but never jumbled coz the method is synchronized.
But if this was done using StringBuilder this may produce a jumbled string say 12a3sd45..
both StringBuffer and StringBuilder use a char array to store the values.
I guess this is where lies difference between a Stringbuffer and builder. please correct me if i am wrong...
Now, the question is -- is there a use for that? Someone suggested multithreaded logging, but I can think of way better ways of doing that.
Also, StringBuffer doesn't have levels, so good luck if you only want to look at the more important warnings.
As for OutputStreams, well, that's a complete non-starter. If they have synchronization, it is at the byte level. That's completely non-useful for logging, unless your messages are only one byte long. Also, not all of the implementations have locking. Guess which ones do and which ones don't -- fun for the whole family!
Finally, if you are using a StringBuffer or an OutputStream for your logging needs without encapsulating it in a logging class, may I suggest that there is a problem with the design of your code? If you ultimately change your mind and decide that a StringBuffer isn't going to meet your needs, then you are going to have a hell of a time extracting it. At the very least, you should be using a wrapper class, and you should probably be using java.util.logging or log4j.
Performance when concatenating strings is one thing, best practice for a lot of applications may be using String just because of its immutability which is the best way to help any garbage collector. Using immutable objects in general is also the best intuitive solution when dealing with concurrency and sharing data among threads.
For the rests of the article,, nice one and thanks
I have a very simple example for you. So it goes like this-
final StringBuilder sb = new StringBuilder();
Thread 1-
sb.append("123");
Thread 2-
sb.append("456");
It may happen that after execution of both the threads sb comes out to be either 123 or 456. However as per the article it should either be 123456 or 456123. The reason for this is that AbstractStringBuilder.append has a method call str.getChars(0, len, value, count); If one thread reaches there and then 2nd thread comes afterwards. Now the first thread proceeds writes 123 and then the second thread proceeds overwrites 123 by 456 which actually corrupts the StringBuilder.
Can you please show me a complete sample program to understand the difference betweeen StringBuffer and StringBuilder? I am theoritically able to understand what you are mentioning, but not in practice.
I would highly appreciate your time and help.
Thanks,
TP
The difference is that all of the methods on StringBuffer are marked synchronized, and none of the methods on StringBuilder are.
Jeremy
You have mentioned that: "But you have no idea what thread 3 will print: "ab" or "ba". I would have to introduce more synchronization to make this produce a sensible result."
Please find my sample program below. It produces 12 or 21 as the output and that, I think, is because of JVM prioritizing thread 1 or 2 according to its wish. Is there a way to always get a consistent output, using threads?
Also, I see 1 as the output sometimes. This is even more confusing as to what in the world will give a consistent result. In other words, what is that "more synchronization" that you have earlier mentioned should be done to this program?
I think I am reiterating a same question again and again.
class StringBuilderTP implements Runnable{
StringBuffer sb = new StringBuffer();
// StringBuilder sb = new StringBuilder();
public synchronized void run(){
//synchronized(this){
if(Thread.currentThread().getName().equals("T1")){
System.out.println("T1");
changeone();
}else if(Thread.currentThread().getName().equals("T2")){
System.out.println("T2");
changetwo();
}
//}
}
public static void main(String[] args) {
StringBuilderTP tps = new StringBuilderTP();
Thread t1 = new Thread(tps);
Thread t2 = new Thread(tps);
t1.setName("T1");
t2.setName("T2");
t1.start();
t2.start();
try{
//Thread.sleep(5000);
t1.join();
}catch(InterruptedException e){
}
tps.result();
}
public void changeone(){
sb.append("1");
}
public void changetwo(){
sb.append("2");
}
public void result(){
System.out.println("value of sb is"+Thread.currentThread().getName()+ sb);
}
}
Thanks,
TP
Yes, it is possible to get consistent results. You basically need to introduce code that makes one thread wait to do its update until the other thread is done. You can do this using any of a large variety of synchronization mechanisms, such as CountDownLatches or wait() / notify().
StringBuffer used NOT to copy the char[] when creating the String (in non shared variant); however that was a major source of issues, incl leaking huge char[] for small Strings. In 1.5 they decided that a copy of the char[] must occur every time and that practically made StringBuffer useless (the sync was there to ensure no thread games can trick out the String). That conserves memory, though and ultimately helps the GC (beside the obviously reduced footprint), usually the char[] is the top3 of the objects consuming memory.
That was basically the main idea behind the new model: copy each time w/ perfectly resized buffer and let the GC be a star.
Concatenation, however, should have been handled in a totally different manner, ala. String.concat, just w/ more parameters.
My main gripe is that both StringBuider/Buffer take no benefit from being in java.lang. Now they are virtually the same class I had in 1.1 (AsyncStringBuffer that was optimized for longer strings)
the post is part from some longer rant on stackoverflow
If you honestly don't care about the order of writes, just that all of them make it in.
For example, imagine a highly parallel Set or Map (both non sorted) implementation. You might try to implement toString in parallel as well. Due to the nature of (non-sorted) Sets and Maps, there are no ordering guarantees. This means toString is really free to spit out the entries in any order.
Thus, you might imagine that you split the work into mutiple threads, sharing a single StringBuffer. Each thread tacks in the entries it is responsible for "toStringing" one at a time. The synchronizedness of StringBuffer ensures that no thread "corrupts" the write of another thread, even though the ordering of each write is not defined. But that is fine, your contract does not assure any sort of ordering. (Then of course, at the end, you remove the trailing comma and append the "[" and "]", but that is really cheap)
Of course, it might be cheaper just to have one StringBuilder for each thread and then concatenate them all in the end.
Or maybe that would be more expensive.
You would have to profile both ways to be sure.
Now of course, these sort of there are a very limited in scope. There aren't many cases where you care everything making it into a string but not caring about the order it gets into a string.
So yea, StringBuffer is mostly worthless, just not completely.
A related question:
http://stackoverflow.com/questions/30482503/is-stringbuilder-threadsafe-using-it-with-parallelstream
Thanks!