[Ruins] [1.15.2] 11 times slower chunk generation performance
gamax92 opened this issue ยท 10 comments
forge-1.15.2-31.2.33
ruins-1.15.2.1
Java 1.8.0_261
Was testing why chunk generation performance in my server was being very slow and found that removing this mod caused chunk generation speed to increase by 11x
I did a separate test with just Forge, Ruins, and a mod called PreGenForge, running the command pregen start 16
to generate everything with a 16 chunk radius. world folder was deleted at the beginning of every test and same seed was used each time.
With only the chunk pregeneration mod, chunk generation would complete in 0:15
With Ruins, chunk generation would complete in 2:42, nearly 11 times slower than without the mod.
I ran the same experiment on my server, with very different results.
Without Ruins:
[10:48:27] [Server thread/INFO] [minecraft/DedicatedServer]: [PreGen] (world:overworld) Started
[10:48:27] [Server thread/INFO] [minecraft/DedicatedServer]: [PreGen] (world:overworld) Progress: 29.69%, Chunks: 304/1024 (10133.33/sec), ETA: 0h:0m:0s
[10:48:30] [Server thread/INFO] [minecraft/DedicatedServer]: [PreGen] (world:overworld) Progress: 40.04%, Chunks: 410/1024 (31.52/sec), ETA: 0h:0m:0s
[10:48:34] [Server thread/INFO] [minecraft/DedicatedServer]: [PreGen] (world:overworld) Progress: 60.45%, Chunks: 619/1024 (53.30/sec), ETA: 0h:0m:0s
[10:48:37] [Server thread/INFO] [minecraft/DedicatedServer]: [PreGen] (world:overworld) Progress: 80.08%, Chunks: 820/1024 (67.86/sec), ETA: 0h:0m:0s
[10:48:40] [Server thread/INFO] [minecraft/DedicatedServer]: [PreGen] (world:overworld) Complete!
With Ruins:
[10:51:08] [Server thread/INFO] [minecraft/DedicatedServer]: [PreGen] (world:overworld) Started
[10:51:08] [Server thread/INFO] [minecraft/DedicatedServer]: [PreGen] (world:overworld) Progress: 29.69%, Chunks: 304/1024 (12666.67/sec), ETA: 0h:0m:0s
[10:51:12] [Server thread/INFO] [minecraft/DedicatedServer]: [PreGen] (world:overworld) Progress: 40.04%, Chunks: 410/1024 (30.66/sec), ETA: 0h:0m:0s
[10:51:16] [Server thread/INFO] [minecraft/DedicatedServer]: [PreGen] (world:overworld) Progress: 60.45%, Chunks: 619/1024 (51.25/sec), ETA: 0h:0m:0s
[10:51:19] [Server thread/INFO] [minecraft/DedicatedServer]: [PreGen] (world:overworld) Progress: 80.27%, Chunks: 822/1024 (70.32/sec), ETA: 0h:0m:0s
[10:51:22] [Server thread/INFO] [minecraft/DedicatedServer]: [PreGen] (world:overworld) Complete!
That's 13 seconds without Ruins, 14 seconds with. Now, there's some timing variation with different seeds, but I tried a few and saw pretty much the same thing--Ruins doesn't add significantly to the generation time. It's hard to say exactly how much, due to the time resolution, but it seems to around 5%. Certainly nothing like the 1000% you're seeing.
I'm running with the Ruins default config and template set. Did you change the config or add new templates? Even if you did, however, I wouldn't expect it to account for such a large discrepancy. More likely, I think, is you may need to increase the amount of memory allocated to your server; maybe you're losing time to paging (though I'm using the defaults there, too).
I don't doubt you're seeing what you're seeing, but I'm not able to duplicate it, and I've tried numerous times.
Another possibility that occurred to me just as I hit the "Comment" button (of course) is there may be seeds that create terrain around worldspawn Ruins may struggle with, like extreme mountains. It could be you just happen to be using one of those seeds. Or did you try different seeds, too? If not, let me know what seed you're using and I'll try it on my server.
I've asked a friend of mine to help test this and they get the same performance issues, except that UseLargePages flag doesn't seem to fix it for them. CPU usage drops to 1 core with a lot of Kernel CPU usage, chunk generation becomes slow. Going to continue investigating the issue.
We're both using Windows 10 64bit, they're on 1903 and I'm on 2004
I really don't understand this. But I've tested the same setup in Linux and world generation runs as smooth as without the mod. I try it in Windows and it runs awful and appears to only be using 1 core at times. While playing around with various JVM options however, I've found that if I use -XX:+UseLargePages
on the JVM, performance now matches Vanilla and what I get in Linux with the mod.
Without that flag:
With that flag:
(CPU Graphs from ProcessHacker, green is kernel cpu usage and red is total cpu usage)
I'm fine with enabling this flag on the JVM, but it still rather odd why out of 100 other mods I have to add it specifically for Ruins.
Since I can't duplicate it on my system, I'm afraid I can't be of much help solving this mystery. I'm running Windows (Windows 10, 64 bit). Ruins isn't doing anything terribly unusual; it does use a moderate amount of memory...but I've seen other popular mods use several times more, so I don't think it's that (unless you've installed a crapton of custom templates).
I think it is possibly the timer spam that RuinsMod.decorateChunkHook()
does, cause after removing the usage of Timers or converting it to a ScheduledThreadPoolExecutor
based approach, I no longer get the decreased chunk generation performance with high kernel cpu time. Even if this happens to be another fluke (cause why did UseLargePages help my computer of all things?), not having the Timer spam definitely looks a lot better than new threads constantly spawning and dying.
Plus if I'm reading the code correctly ... this causes structures to appear 15 seconds after a chunk is created? Why such a long delay?
Also I did test trying Forge 31.2.0, didn't see a change in behavior on my computer or friend's computer.
I'm running 2004 as well. Can you try the latest recommended Forge build (31.2.0) without the large pages flag to see if you get the same behavior? I'm seeing different memory usage profiles between Forge 31.2.33 and 31.2.0, so maybe something's there.
There is a hardware-dependant point at which creating new threads with a large number of threads already waiting massively degrades performance.
The 15 seconds are a clumsy workaround to a basic issue with world generation: You cannot access anything outside the currently generating chunk. Neighbours might not exist yet.
Ruins wants to access neighbour chunks very often. So, we wait an arbitrary amount of time for generation to progress and finish and do the logic from the main server thread, which has full access to the generated world.
I do like gamax92's suggestion to draw from a pool instead of spinning up raw threads, though. That's going to save in terms of both heap and CPU load. Excessive demands on heap may very well have been the cause of the original paging issue.
Well, technically creating all those threads is ridiculous in the first place. Ruins should just save the chunk coordinates and a timestamp 15 seconds in the future into a concurrency-safe FIFO queue. Then, a hook into the main server tick simply peeks at the queue head to decide if Ruins generation should happen. Consume those markers and let Ruins generate until queue or timestamp eligible markers empty, a certain amount of markers done or even better yet until a certain amound of time (100ms?) has passed for the server tick. Then break and do the same next server tick.