Lithium (Fabric)

Lithium (Fabric)

22M Downloads

Heavy redstone/mechanisms can lead to broken chunk loading within the same dimension

LucilleTea opened this issue ยท 16 comments

commented

Originally reported by Andrews54757 on the Discord.

When the server is under some heavy loads (Andrews54757 reports many will work, I've only reproduced it while running this ice farm), chunk loading within the same dimension will stop working properly.
Only chunks very close to players will load, and nearby chunks will also unload when they shouldn't. A player travelling to this dimension through a portal is likely to overload the server, kicking everyone but not crashing it.

Here's a video of the chunk loading issue in 20w14a.

I've reproduced this issue both in 1.15.2 and 20w14a.
In 1.15.2 running lithium 0.4.6 and optionally phosphor 0.5.2
In 20w14a running lithium 0.5.0-alpha1 and phosphor 0.5.2

In 1.15.2 I've found that using phosphor, and disabling both use_fast_shape_comparison and extend_block_shape_cache, will fix the issue - chunks will load properly and players won't be kicked. Andrews54757 reports that this wasn't enough to fix the problem.

commented

Starlight may alleviate this issue. It's nearly 35x faster than the Vanilla lighting engine while generating chunks, and around 32x faster than Phosphor*.

CC @MissPotato.

* Unless I did my math wrong, that is what this graph seems to suggest.

commented

The first thing that comes to mind is that we should block the game thread whenever the light update queue exceeds a certain size, but that could lead to rare and excessively long ticks.

This seems like it'll be necessary as a fail safe no matter what. Perhaps allow the upper bound to be adjusted in the config. The threaded lighting implementation has a lot of problems resulting in a lot of glitchiness, with this issue mostly covered by MC-164281. I suspect that in a few years when Mojang gets around to it, they'll probably "fix" it in the same way.

commented

Before blocking the game thread completely, we could throttle it a bit before we have to block it. That might prevent huge sudden lag spikes

commented

what kind of help is wanted from this?

commented

A world that doesn't have much clutter but easily reproduces the crash would be helpful. At this point we just have to implement a workaround for the light queue becoming too long and test that using some test worlds.

commented

Dispensers are very slow for example a 64 item dropper only drops 32

commented

Dispensers are very slow for example a 64 item dropper only drops 32

Is that related to this issue? I'm following along the convo, but I think I'm misunderstanding something because I don't see how the light engine stalling is related to some dropper issue. Some more details would be super helpful, @zLauch :)

commented

I am unable to reproduce this issue regardless of the configuration used. I'm suspicious this might be caused by an overload condition and is not necessarily the fault of Lithium. The issues with chunks loading/unloading incorrectly looks to be client-side (noting that they can reappear and disappear in single frame).

commented

I'm leaving this issue as confirmed since multiple players have been able to reproduce the issue here, but since resolutions vary from person to person it's not clear what's the actual underlying bug in Lithium, if any.

commented

I'm not convinced this issue is directly caused by Lithium. I've spent the last few days trying to narrow down what's going on, and it seems to be that the light engine is becoming so overburdened as to completely stall any tasks waiting for it to finish (chunk loading/saving, server shutdown, etc.)

In other words, Lithium provides such a significant speed up to the game server that it becomes impossible for the light engine to keep up, and as it has an unbounded queue it will eventually grow to ridiculous size and force a stall when anything queries the state of the light engine.

The initial user reports we had indicating that this was caused by the shape patches appears to only be the case because they had such a large impact on server tick times. With the new optimizations landing in the 1.16.x branch, it seems disabling them no longer pushes tick rates back up high enough as to prevent the light engine from becoming overburdened.

I'm going to be adding diagnostic tracers into Lithium so users can be signaled when this overload condition occurs, but it's not clear what the best solution is for resolving it. The first thing that comes to mind is that we should block the game thread whenever the light update queue exceeds a certain size, but that could lead to rare and excessively long ticks.

commented

Marking as critical as this issue is appearing more with recent improvements and can result in the server being abruptly killed by the watchdog, possibly creating invalid state on disk.

commented

Bravo!!! Amazing work finding out the culprit of the problem! I am very impressed!

commented

Our server has been having this lockup as well, it's caused quite a few lighting glitches but hasn't appeared to do any severe damage otherwise. At the request of the AOF3 pack dev I had the server owner install the /suites/1732359139/artifacts/32814842 build. I have quite a lot of documentation available here: TeamAOF/All-of-Fabric-3#232 though most of the useful stuff is in the logs.

We're willing to use experimental versions of the mod, however we haven't really found a reliable way to reproduce it. Logging into unloaded chunks but generated chunks seemed to cause it the most often. We're going to keep using the earlier mentioned build.

If you wanna get in contact with me I'm in the discord by the username "LadyPotaty#3238"

commented

This issue is an issue that appears in vanilla. The current information I have is: As vanilla does not wait for the light engine to finish working on its update queue besides when chunks loading chunks, it is possible that the server will schedule light updates all the time, and the light engine never manages to empty its update queue. This causes the chunk loading to wait for the light updates to finish forever, which causes you to notice that chunk loading doesn't work.
Lithium optimizes the server but not the light engine, so it might happen more often that a mechanism that schedules lots of light updates is causing the light engine to lag behind like this. Some users stop experiencing this problem by also speeding up the light engine, e.g. by installing Phosphor

commented

It appears that the snapshot version didn't fix it, though it does seem to happen less often. I don't if it's happening less often due the unpredictable nature of it. It seems to be happening most often when users join the world and during rocketed elyra flight. It's seems that it's also unable to save lighting level to disk for some chunks, as every time we restart the server the same two chunks go pitch black, even after replacing torches. We do have phosphor as well.

Have you looked at what https://github.com/PaperMC/Paper does to handle this? I would imagine this is a bug they've encountered.

The owner of the server has stated that they're fine using experimental versions if you think playtesting could help expedite finding a solution.

commented

So is this mod so good its breaking the game because its TOO good/fast? Thats impressive!