Tuesday, July 7, 2009

How Hotspot Decides to Clear SoftReferences

I got asked about this twice in one day, and I didn't know the answer, so I sat down and puzzled it out a bit.

A SoftReference is a reference that the garbage collector can decide to clear if it is the only reference left to an object, and the GC decides (through some undefined process) that there is enough memory pressure to warrant clearing it. SoftReferences are generally used to implement memory-sensitive caches. A mistake many people make is to confuse it with a WeakReference, which is a reference that, if it is the only reference left to the object, the garbage collector will aggressively clear.

I got asked how Hotspot's GC actually decides whether to clear SoftReferences. Twice. On the same day. (Hotspot is the Sun / OpenJDK JVM). The answer was that I had no idea, so I had to go look it up. Here is what I found out; I hope it is a useful reference for someone (pun intended).

First, there is a global clock variable that is set with the current time (in millis) every time a garbage collection occurs. Every SoftReference has a timestamp field that is set to the current value of clock when it is accessed (when it is constructed or the get()) method is called. This gives a very coarse ordering over the SoftReferences; the timestamp indicates the last GC before they were accessed.

When a garbage collection occurs, the decision to clear a SoftReference is based on two factors:
  1. How old the reference's timestamp is, and
  2. How much free space there is in memory.
The calculation is pretty simple. If:
  • free_heap is the amount of free heap space in MB,
  • interval is the time between the last GC's clock and the timestamp of the ref we are currently examining, and
  • ms_per_mb is a constant number of milliseconds to keep around a SoftReference for each free megabyte in the heap
Then the decision is made by:
interval <= free_heap * ms_per_mb
To take an example, let's say that we have a SoftReference with a timestamp of 2000ms, the last GC's clock time was 5000ms, the ms_per_mb is 1000 and the free space is 1MB. We then test whether:
5000 - 2000 <= 1 * 1000
This is false (3000 > 1000), so we clear the reference.

Now let's say there is more free space — say, 4MB — and so less reason to clear the SoftReferences. The calculation is now
5000 - 2000 <= 4 * 1000
This is true (3000 <= 4000), so we don't clear the reference.

One thing to notice about this is that it implies that SoftReferences will always be kept for at least one GC after their last access. Why is that? Well, for the interval, we are using the clock value of the last garbage collection, not the current one. As a result, if a SoftReference has been accessed since the last garbage collection, it will have the same timestamp as that garbage collection, and the interval will be 0. 0 <= free_heap * 1000 for any amount of free_heap, so any SoftReference accessed since the last garbage collection is guaranteed to be kept. This is actually how this question came up; some of my colleagues notices their SoftReferences weren't being cleared during a garbage collection, and they didn't know why.

ETA: The above paragraph originally said "one Full GC", not "one GC". In our case, it was one full GC, because the objects being allocated were too big to be allocated in the young generation. In most cases, it will be one GC of the generation in which the object was allocated, which will usually be the new generation / TLAB.

Another thing to notice is that the value of 1000 for ms_per_mb is fairly arbitrary. It can be adjusted with the JVM flag -XX:SoftRefLRUPolicyMSPerMB=n. If you adjust it down, then free_heap * ms_per_mb will be smaller, and so SoftReferences are more likely to be cleared. If you adjust it up, you get the opposite effect.