TPS slow down followed by OOM crash

Question

TPS slow down followed by OOM crash

Rektroth opened this issue 3 months ago · 5 comments

Rektroth commented 3 months ago

Describe the bug
I run a dedicated server for my friends and I. For a few days, no one had been on the server, but the classic "Can't keep up! Is the server overloaded?" message started getting recorded in the logs every minutes... and then every few seconds... until finally C2ME crashed the server with an OOM error.

To Reproduce
Steps to reproduce the behavior:

Run the server
- java -Xmx12288M -Xms12288M -XX:+AlwaysPreTouch -XX:+DisableExplicitGC -XX:+ParallelRefProcEnabled -XX:+PerfDisableSharedMem -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:G1HeapRegionSize=8M -XX:G1HeapWastePercent=5 -XX:G1MaxNewSizePercent=40 -XX:G1MixedGCCountTarget=4 -XX:G1MixedGCLiveThresholdPercent=90 -XX:G1NewSizePercent=30 -XX:G1RSetUpdatingPauseTimePercent=5 -XX:G1ReservePercent=20 -XX:InitiatingHeapOccupancyPercent=15 -XX:MaxGCPauseMillis=200 -XX:MaxTenuringThreshold=1 -XX:SurvivorRatio=32 -jar server.jar nogui
Wait for a few days with no one online

Expected behavior
No slow down or crash.

Runtime info (please complete the following information):

OS: Debian 10
Minecraft version: 1.20.6
Mod version: 0.2.0+alpha.11

Crash reports / logs

It should be noted that everything past [17:43:39] did not get recorded in the latest.log file. I had to manually copy it from the Debian console and paste it into the file.

For privacy, player names, UUIDs, IP addresses, and chats have been redacted.

Other mods

Checklist

I am using the official version of the mod.
I tried the latest development version but the issue persists.
I searched for similar open issues and could not find an existing bug report on this.

ishland · Answer 1 · 2024-06-21T16:32:19.000Z

Try reproduce without C2ME. And add -XX:+HeapDumpOnOutOfMemoryError as a jvm flag to your server.

Rektroth · Answer 2 · 2024-07-05T03:02:19.000Z

It may take time for the issue to reoccur (if it does at all), but I have added the flag. I've also reduced the server memory to 10GiB - I'm fairly certain the problem was not with the system running out of physical memory, but I'm trying it.

As for reproducing without C2ME, I'm not keen on running tests with the dedicated server, and I don't have the resources to run a server 24/7 elsewhere while waiting and watching for a crash. But honest question: does the UncaughtExceptionHandler in thread "C2ME Storage #8" and the like not conclusively indicate a C2ME issue?

Rektroth · Answer 3 · 2024-07-29T23:35:29.000Z

It re-occurred. Relevant part of the log. At a glance, I don't see an obvious indication that it's C2ME, but I'll let you confirm this.

I'm also now simply removing C2ME. If it re-occurs without before you can confirm, I'll update.

ishland · Answer 4 · 2024-07-30T00:34:06.000Z

You might want to inspect the file java_pid10474.hprof. Handle that file like a core dump.

https://wiki.archlinux.org/title/Core_dump
Warning: Core dumps should be shared only with trusted parties as they may contain sensitive data (such as passwords or cryptographic keys).

But honest question: does the UncaughtExceptionHandler in thread "C2ME Storage #8" and the like not conclusively indicate a C2ME issue?

Anything that leak memory can cause such issue.

Rektroth · Answer 5 · 2024-08-18T23:23:12.000Z

Opening it results in this:

An internal error occurred during: "Parsing heap dump from 'C:\Users\Rektroth\Downloads\java_pid10474.hprof'".
Java heap space

Honestly, we're getting into areas I'm not very knowledgeable on.

It's been a few weeks and, as far as I can tell, removing C2ME from the server has resolved the problem. Spending my time learning how to further investigate this is not worth the (frankly) very minimal benefit the mod provides. I'm simply going to move forward without.

Best of luck to you, though.

Share to