BUG - NPE during Sync
maldiablo opened this issue · 10 comments
As seen in: #129
NPE exception on client sync. Clients can't log in. Version 1.10.18.
I went the extra mile and figured out how to reproduce the issue linked in 129.
The answer is that AS doesn't play nicely with chunkloaders.
When an AS multiblock structure is only partially loaded due it to being spread across more then one chunk, NPE errors happen and players fail to sync.
To reproduce, build an AS structure across a chunk boundary (I used an attunement altar). Put a chunkloader to cover only one of the chunks the altar is on. Finally, move far enough away to cause chunks to unload. The loaded chunk will remain while half of the device is now offline. At that point, players will no longer be able to log in and get sync errors until the server is restarted.
AS needs to better check the integrity of objects before processing events tied to them.
Just wanted to add - I've got no experience with java but I've done dev work in the past. I'm hoping what I'm saying makes sense in spite of my inability to toss around java dev specific terminology.
Also, this happens without chunk loaders too. Chunk loaders just make the bug easier to reproduce. I've had the issue caused by afk players who were just far enough away from AS objects to have them only partially unload. At that point, it's the same sync issue all over again and nobody can get in anymore.
My thought is this: Could you possibly fast track a fix by having components automatically check/force load neighboring chunks before processing anything? I'm sure it's not as elegant, but it's quick and should be relatively foolproof. Just have each entity check the 8 chunks around it and load anything that's not there.
Ensuring neighboring chunks are loaded before event processing should be more then enough to prevent the NPEs if I'm right about this bug.
I saw the 1.10.19 change notes about checking if chunks are loaded and I'm testing it out now. If I don't see any more sync errors with my players for a day or so, I'll follow up here and let you know.
Thanks for this. It looks promising so far.
Awesome mod.
Just an update - Without chunkloaders I've stopped getting NPE errors and Sync errors. So I'm pretty sure the changes in 1.10.19 improved stability in that area. Unfortunately, once I added my server's chunkloader plugin back (Weirding gadget), I started getting the errors again.
Can somebody tell me by looking at the source of weirding gadget if it's a possible mod conflict? If so, I'll try a different chunkloader on my server and report back (if it's determined that it'll help). I'm wondering if weirding gadget hooks something AS depends on and reports bad info back to AS when it's doing some checks but I'm out of my depth when looking at java code.
Weirding gadget source is here: https://github.com/AtomicBlom/WeirdingGadget
Looks like I still get the sync error with or without chunk loaders on my server. Only thing this new version seems to do is make the errors take longer to manifest. Chunkloaders still make the error happen sooner though.
https://pastebin.com/raw/QBFNdAKX
The server cluster I run has come into this issue as well. Has anyone found a workaround or a solution?
https://pastebin.com/0BERq3Xv
This seems to be a server killer.
Having this problem on my server.
Players keep being kicked out with a fatal exception. And, it's the NPE during sync.
On AS .19
In my opinion, this is quite a critical issue.
Tried updating Sponge (we can't really remove it), removing some extra server-side mods until we had the bare minimum Enigmatica 2 Expert setup, but players still get NPE'd out.
This issue is intermittent, sometimes it'll just let a player through after a lot of trying to join. And it's not even guaranteed.
I feel compelled to mention as well that starlight networks randomly stop working as the server's uptime rises. Crystals will stop connecting even if you break and place them back, and eventually this fate extends to the entire network. A server restart temporarily restores the Starlight network, even if the Integrity Check option is disabled. The place this was happening was not chunkloaded by any means. The crystals that stopped working had their structures fully contained within a chunk.
Just throwing this in in case it's related to the NPE's.
We also tried the Integrity Check before and a full deletion of Astral's world data to no avail.
World data has no impact on the multiblock caching involved here. It destroys the crystal data though. This may impact sextant operations, I really don't recall.
Delete the multiblock client side cache if you want to try deleting something related to multiblocks though.
SLN dying is likely more related to sponge than this, since the systems aren't particularly interlinked.
Woukd also recommend spinning up the server instance as a secondary world without sponge as a test operation.