[1.17.1] Silent Gear hangs Forge loading
noncom opened this issue ยท 10 comments
Versions
- Silent Gear: 1.17.1-2.7.4
- Silent's Gems: N/A
- Silent Lib: 1.17.1-5.0.0
- Forge: 37.0.84
- Modpack: Limitless 4 version 0.10.0 as well as 2-3 versions before that
- Optifine Installed: No
Expected Behavior
Minecraft instance should load fully, be it server or client. Silent Gear should load correctly as well and bring us joy :)
Actual Behavior
Server and client do not load fully, they stop at absolutely the same point during initial loading. The loading stops after the following lines have appeared in the log, both on client and server:
... previous log ...
[20:04:26] [modloading-worker-13/INFO]: Found 8828 unique defined recipes
[20:04:26] [modloading-worker-13/INFO]: Found 4814 unique defined loot tables
[20:04:26] [modloading-worker-13/INFO]: Found 4545 unique defined advancements
[20:04:26] [modloading-worker-13/INFO]: Added 31 materials
The behavior was observed both on server and client. At this point the client is still at the "white screen" stage, before any loading screen appears. The server just stops dead in its tacks as well.
The exact numbers in these log lines likely depend on the particular modpack and don't mean anything, so what is significant here is the idea of the exact stage of loading the instance. The CPU load for the process keeps being around 10% with "Not responding" status.
The process never ends, the instance does not crash, there's never an error in the logs. It was only possible to retrieve the current threads stack traces via the jps
/jstack
combo, so that's what's in the linked log and what allowed to trace the issue back to Silent Gear and it looks like a deadlock over some kind of a race condition. The stack traces are always the same.
Removing Silent Gear from the pack lets the client load correctly any number of times.
Links/Images
- https://gist.github.com/noncom/56e026f3aacbfd73a844054c1d93396c -- pay attention to
"Render thread" #1
and"modloading-worker-11" #42
Steps to Reproduce the Problem
- The issue was observed by several people in modpack Limitless 4 but you can't do anything special for this issue to appear, it might occur or might not occur. It might be happening all day, or not a single time on the next day. Various random actions like removing all configs or removing a random mod might clear it up or not. The difference is that removing Silent Gear prevents it completely and adding it back also brings back the problem. The observations say that
running a client instance from GDLauncher is the most sure way to reproduce it --running Limitless 4from GDLaunchermight be the most sure way known to reproduce this.
I see both
waiting on the Class initialization monitor for net.minecraftforge.fmllegacy.network.FMLNetworkConstants
and
waiting on the Class initialization monitor for net.minecraftforge.fmllegacy.network.FMLHandshakeHandler
on two separate threads. I wonder if that might be related? Those classes seem to be stuck waiting for the other to load for some reason.
Could be, I noticed that -- these are also always there. All the times I've collected the stack dumps, it was all absolutely the same picture across all the threads including the lines you cited. I don't really know well Forge APIs or inner workings, so can't say much about the facts we're seeing here :) I can't even be absolutely sure that it's SilentGear who's really making it all happen.
Except just one more detail to add to the report: today it also haunts me when I try to play through the CurseForge launcher as well, 100% reprorate. Well yeah, I know that launchers work differently with the JVM but apparently this is not something that matters here and the observations about GDLauncher being an important factor here are not holding.
Hey, I've taken things apart and run through things.
The issue is this line in your Network code:
I think it's a concurrency issue with your static initializer in your Network.java
You're requesting a Biconsumer of FMLHandshakeHandler before the networking code has initialized ...
Technically you should only init your networking code after the initialize step.
I know it works on some computers, but not on others (my friend's instance is 100% fine, but on my system it locks up "reliably")
I might clone things and see if i can mitigate the issue, might even work with a delay .. (I mean .. ugh :P )
Oh wow, an awesome find! I didn't even look thoroughly at the code before, but now that you're saying this, I looked and.. boom! Looks like initializing stuff in static initializers is the biggest trend in fashion these days :D Forge does this, Silent Gear does this... well yeah, guess what happens :D
So the Network
class statically requires the FMLHandshakeHandler
class which in turn statically requires FMLNetworkConstants
which in turn statically requires NetworkInitialization
which in turn, again, statically requires FMLHandshakeHandler
. But FMLHandshakeHandler
initialization is already in progress.. so it never ends and keeps waiting forever.
If your JVM was in luck of initializing Forge network stuff before, or if you're on a JVM that might specifically be capable of handling this deadlock (are there any JVMs with such caps?) then you pass through.
As far as I see the minimal fix for this would be just to move the Network.channel
initialization block into a lazy variable. Because, according to the stack trace this issue here happens not even when that channel
is needed for real, it's just mod initialization phase. I might be wrong here because I don't know how it all unfolds, but it looks like that should be enough.
A good rule is to always avoid initializing statically anything beyond primitive types. Do not mix you program and language domains as long as there are means to avoid doing so. Especially when things might get into creating static loops.
So, I found a "workaround" ...
FML has the ability to delay loading till other mod are loaded, in which case if you change this:
toordering="AFTER"
Minecraft starts up, no issue (because Forge has already loaded and inited)
Oddly enough in TownCraft with almost 200 mods doesn't have this problem on client or server. /shrug
Well, it's a JVM-internals-based race condition, so might as well be whatever-dependant, from the number of classes to load to hardware CPU specifics. And iirc the order of static initialization in Java is not defined/guaranteed, so everyone is at random chances mercy. But likely in most cases the deadlock is avoided because Forge just loads first. It just happens to happen so. Not in 100% of the cases, apparently, but most people can well play Limitless 4 or ATM7 modpacks as well.
The lockup seems to be more repeatable on a Ryzen CPU with more than 12 threads, Forge spins up 20 "workers" to create everything, and this just happens to be in the wrong place at the wrong time.
As mentioned a couple times even in the ATM repo, if you close and reopen a couple times, sometimes you get lucky and it goes through, and for most of the players it seems to work, it's just frustrating and the fix is relatively simple (just tell FML that Forge needs to be loaded first), instead of relying on random chance that Forge will be loaded in time.
Thanks to some random circumstances today and being armed with the discoveries of Anoyomouse here, I went to look at config/fml.toml
in the modpack and... found this:
# max threads for parallel loading : -1 uses Runtime#availableProcessors
maxThreads = -1
So I changed this to
maxThreads = 1
aaaaand the modpack started loading every time! :D Awesome!
I don't know if that was a common knowledge for everyone here, I'm just too ignorant in Forge and all that, but this really solves the problem instantly now! This might somewhat affect the loading time a bit, but no loading vs loading for a bit longer is a good trade. I'm still looking forward to a proper fix for this, but hey, it's possible to unblock this issue locally!