I got asked about this twice in one day, and I didn't know the answer, so I sat down and puzzled it out a bit.
A SoftReference is a reference that the garbage collector can decide to clear if it is the only reference left to an object, and the GC decides (through some undefined process) that there is enough memory pressure to warrant clearing it. SoftReferences are generally used to implement memory-sensitive caches. A mistake many people make is to confuse it with a WeakReference, which is a reference that, if it is the only reference left to the object, the garbage collector will aggressively clear.
I got asked how Hotspot's GC actually decides whether to clear SoftReferences. Twice. On the same day. (Hotspot is the Sun / OpenJDK JVM). The answer was that I had no idea, so I had to go look it up. Here is what I found out; I hope it is a useful reference for someone (pun intended).
First, there is a global clock variable that is set with the current time (in millis) every time a garbage collection occurs. Every SoftReference has a timestamp field that is set to the current value of clock when it is accessed (when it is constructed or the get()) method is called. This gives a very coarse ordering over the SoftReferences; the timestamp indicates the last GC before they were accessed.
When a garbage collection occurs, the decision to clear a SoftReference is based on two factors:
Now let's say there is more free space — say, 4MB — and so less reason to clear the SoftReferences. The calculation is now
One thing to notice about this is that it implies that SoftReferences will always be kept for at least one GC after their last access. Why is that? Well, for the interval, we are using the clock value of the last garbage collection, not the current one. As a result, if a SoftReference has been accessed since the last garbage collection, it will have the same timestamp as that garbage collection, and the interval will be 0. 0 <= free_heap * 1000 for any amount of free_heap, so any SoftReference accessed since the last garbage collection is guaranteed to be kept. This is actually how this question came up; some of my colleagues notices their SoftReferences weren't being cleared during a garbage collection, and they didn't know why.
ETA: The above paragraph originally said "one Full GC", not "one GC". In our case, it was one full GC, because the objects being allocated were too big to be allocated in the young generation. In most cases, it will be one GC of the generation in which the object was allocated, which will usually be the new generation / TLAB.
Another thing to notice is that the value of 1000 for ms_per_mb is fairly arbitrary. It can be adjusted with the JVM flag -XX:SoftRefLRUPolicyMSPerMB=n. If you adjust it down, then free_heap * ms_per_mb will be smaller, and so SoftReferences are more likely to be cleared. If you adjust it up, you get the opposite effect.
A SoftReference is a reference that the garbage collector can decide to clear if it is the only reference left to an object, and the GC decides (through some undefined process) that there is enough memory pressure to warrant clearing it. SoftReferences are generally used to implement memory-sensitive caches. A mistake many people make is to confuse it with a WeakReference, which is a reference that, if it is the only reference left to the object, the garbage collector will aggressively clear.
I got asked how Hotspot's GC actually decides whether to clear SoftReferences. Twice. On the same day. (Hotspot is the Sun / OpenJDK JVM). The answer was that I had no idea, so I had to go look it up. Here is what I found out; I hope it is a useful reference for someone (pun intended).
First, there is a global clock variable that is set with the current time (in millis) every time a garbage collection occurs. Every SoftReference has a timestamp field that is set to the current value of clock when it is accessed (when it is constructed or the get()) method is called. This gives a very coarse ordering over the SoftReferences; the timestamp indicates the last GC before they were accessed.
When a garbage collection occurs, the decision to clear a SoftReference is based on two factors:
- How old the reference's timestamp is, and
- How much free space there is in memory.
- free_heap is the amount of free heap space in MB,
- interval is the time between the last GC's clock and the timestamp of the ref we are currently examining, and
- ms_per_mb is a constant number of milliseconds to keep around a SoftReference for each free megabyte in the heap
interval <= free_heap * ms_per_mbTo take an example, let's say that we have a SoftReference with a timestamp of 2000ms, the last GC's clock time was 5000ms, the ms_per_mb is 1000 and the free space is 1MB. We then test whether:
5000 - 2000 <= 1 * 1000This is false (3000 > 1000), so we clear the reference.
Now let's say there is more free space — say, 4MB — and so less reason to clear the SoftReferences. The calculation is now
5000 - 2000 <= 4 * 1000This is true (3000 <= 4000), so we don't clear the reference.
One thing to notice about this is that it implies that SoftReferences will always be kept for at least one GC after their last access. Why is that? Well, for the interval, we are using the clock value of the last garbage collection, not the current one. As a result, if a SoftReference has been accessed since the last garbage collection, it will have the same timestamp as that garbage collection, and the interval will be 0. 0 <= free_heap * 1000 for any amount of free_heap, so any SoftReference accessed since the last garbage collection is guaranteed to be kept. This is actually how this question came up; some of my colleagues notices their SoftReferences weren't being cleared during a garbage collection, and they didn't know why.
ETA: The above paragraph originally said "one Full GC", not "one GC". In our case, it was one full GC, because the objects being allocated were too big to be allocated in the young generation. In most cases, it will be one GC of the generation in which the object was allocated, which will usually be the new generation / TLAB.
Another thing to notice is that the value of 1000 for ms_per_mb is fairly arbitrary. It can be adjusted with the JVM flag -XX:SoftRefLRUPolicyMSPerMB=n. If you adjust it down, then free_heap * ms_per_mb will be smaller, and so SoftReferences are more likely to be cleared. If you adjust it up, you get the opposite effect.
Comments
So I asked this question to Gil Tene (Azul Systems) at Devoxx last year. His answer was short - "you don't want your algorithm to be heavily dependent on GC behavior, do you?". This is actually very interesting point as GC can indeed change from one JVM to another.
Could you elaborate on this?
IMHO using Softreferences in an application server is evil, because of this very coarse grained way to configure their behaviour.
Note also that the mechanism implies that Softreferences will (in low memory situations) only be reclaimed after more than one full GC.
Sadly, there is no easy way to do what they do - clear memory in response to memory pressure. You could, for example, do it by checking the available memory every so often in a thread, and having that thread clear part of your cache, but then you have to worry about timing and concurrency issues.
Speaking of timing and concurrency issues, another problem with using them for caching is that it is not immediately obvious how to make a Map with them that is thread-safe. Fortunately, my fellow Googlers Bob Lee and Kevin Bourrillion have come up with a very handy class for writing all sorts of caches, including Weak/SoftReference ones. I think it should probably end up in the JDK in one form or another.
@Markus - agreed about them being evil. It is sadly true that there is no other really good way of doing what they do in Java.
http://java.sun.com/javase/6/docs/api/java/lang/management/MemoryMXBean.html
Using that you can watch the heap sizes post-GC and take actions based on the data you collect.
GC interacts with Reference and ReferenceQueue. When a SoftReference referent is cleared by GC, is there a guarantee that no thread can see the old reference, except for PhantomReference?
Does GC have any happens-before guarantees?
The memory model docs and Java Concurrency in Practice (Brian Goetz, et al) are silent on this. Bill Pugh once wrote that Reference.referent should be volatile to prevent a data race, but jdk1.6 source shows it is not.
When a SoftReference referent is cleared by GC, it is guaranteed that only SoftReferences refer to it (if you look around for the definition of "softly reachable", you will find this). The SoftReferences to an object are all cleared atomically. Therefore, no other thread will have a reference to the object when it is cleared.
Bill is correct about the fact that the referent should be volatile; this is mostly to share it between threads before it is cleared, though. There is no danger of seeing a reference to a cleared object (this would violate several other guarantees that the VM provides)
Looking again at the Bill Pugh statement (from java memory model list, 2004):
``
The only cases that we really need to worry about are:
thread 1 calls clear, thread 2 calls get
GC clears reference, thread 2 calls get
which are handled by making the referent field volatile.
''
The first case is about clearing the reference before GC does it. In the absence of a volatile referent, a subclass of SoftReference can wrap get() and clear() using synchronized(this) to prevent data races.
The second case caused my concern about lack of explicit GC happens-before guarantees. I think the atomicity requirements of Reference implies happens-before, but it is not part of the list of happens-before cases in the memory model doc.
Another concern about Reference.referent not being volatile: This appears to be an example of unsafe publication in "Java Concurrency in Practice" sec. 3.5.1
But in practice, if thread 1 constructs a SoftReference, and thread 2 uses that Reference, as long as there is a happens-before between threads 1 and 2, the memory caches should be synchronized w.r.t. the referent. For example, if the SoftReference is passed from thread 1 to thread 2 via a ConcurrentHashMap as a stored value, the referent should be intact.
Now, arguably, someone should take up the flag for this. I'll talk to some members of the core libraries cabal.
www.tagtraum.com/gcviewer.html
I used it a lot it 2006-2007. The zoomable diagrams it makes are very cool looking and worth studying.
(When I used it in 2006-2007; it would complain about some log lines produced by newer GC modes, but was still useful.)
When you say "The SoftReferences to an object are all cleared atomically. Therefore, no other thread will have a reference to the object when it is cleared."
You mean No-References-At-All or only No-References-and-SoftReferences?
Why i ask - because i read article = Jonathan Amsterdam, Adjunct Associate Professor of Computer Science at New York University and president of Astrel - "Working with the garbage collection"[http://www.ddj.com/showArticle.jhtml?articleID=184404000&queryText=finalise] and read so "This problem occurs because the clearing of a soft reference does not imply that the object is completely unreachable. When a weak reference is cleared, there is truly no way to reach the object (not even through another weak reference -- the specification requires that all weak references to an object are cleared atomically)."
This all happens atomically, though, so the atomicity property doesn't really enter into it.