spark

spark

95M Downloads

[Feature] Add used memory after GC to memory reports

i0xHeX opened this issue ยท 3 comments

commented

Problem:
Used memory at the time we send for example /spark health includes garbage memory, that will be collected soon. When looking for lag source (or for memory leak) checking used memory may confuse, at the moment it could be like 80%, and the next time GC will collect everything so usage become 47%:
https://i.imgur.com/MIdEkcl.png

This forces to send command again and again to see the approximate memory usage excluding garbage.

Solution:
We can calculate used memory without GC, lets assume we check it every tick:

  • Record current memory usage to current_mem_usage
  • If there is prev_mem_usage recorded, then compare. If current_mem_usage < prev_mem_usage, then we assign no_garbage_mem_usage = current_mem_usage, else do nothing.
  • Assign prev_mem_usage = current_mem_usage

So briefly if we see that GC collected garbage, we update so called no_garbage_mem_usage to current memory usage. This will produce us "stable" memory usage. I think it would be much better to use some GC listeners if possible. I never worked with this, but searching fast I found something:
https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/GarbageCollectionNotificationInfo.html

commented

Running /spark heath --memory will report the memory usage at the last GC.

Also, /spark gcmonitor will report how much memory is freed on each collection (it hooks in with the GC listeners you mentioned)

commented

Something wrong here, may be cause of ZGC:
image

It always shows about 2.6GB, may that's the maximum memory reached before GC?
By creating this issue I meant usage after GC, not before.

commented

Not sure there's anything I can do here, since spark just reports what the Management bean says - sorry!