Sodium 0.5 reveals bad handling of GL calls with ImageRenderer, crashes
cocona20xx opened this issue · 13 comments
Not sure exactly what's going on here, since looking at it under a debugger hasn't offered much insight, but attempting to render a YACL screen under QSL+QFAPI and Quilt Loader with certain mods causes extremely strange rendering behavior in the main game menu—the game does not freeze, but instead becomes unable to properly update the window, and said window becomes impossible to close.
The issue has been replicated with the latest versions of Zoomify (2.11.0
) and Chat Patches 201.5.5
but not Whereisit (on latest version 2.0.16
), which based on patch notes, appears to suggest the issue is related to YACL preview image/screen renderers, despite said renderers not being present yet when the main screen of the ModMenu GUI is rendered.
Other dependency info:
QSL + QFAPI: 7.1.1+0.86.1-1.20.1
(latest release)
QKL (only when testing bugged behavior in Zoomify): 2.1.2
(latest release)
Mod Menu: 7.2.1
(latest release)
Note: Tested on an Apple Silicon Mac—if the bug cannot be replicated on other machines, Apple-related jank might be why!
Logs from the bugtest for Chat Patches for posterity, even though there's nothing interesting in them: https://paste.ee/p/e2HVR
We've had a few people in our Discord server come across this issue (on both the Iris and AMDGPU driver under Linux) and it seems that 39bc5b5 does indeed fix the crash.
May be related to the issues mentioned with Sodium, since those are all also connected to Zoomify seemingly?
This may correlate to #96.
To be clear, you are saying after opening a YACL GUI successfully, then closing to the main screen, THEN it becomes an issue? I haven't been able to repeat this on Windows with the BlanketCon pack (using Controlify as the YACL mod with images), so this could be Metal completely shitting itself after the GL errors.
Try the CI build from b3d5164 and see if it mitigates this issue.
Not exactly: the screen drawing breaks when opening the ModMenu menu at all with a bugged mod installed, not just when a YACL screen is drawn successfully. I'll test the build you posted later to see if it works, though.
Unfortunately, no, build b3d5164 does not fix the issue. Is there anything else I can do on my end to help with debugging this, considering it's both a macOS/Apple Silicon-specific and Quilt-specific issue?
I don't have any reason to believe the problem is with Sodium. I think whatever YACL is doing to handle texture uploads is probably not correct. This code in particular is really suspect, because if it's touching OpenGL state from that thread, everything is going to explode.
In general, while the vanilla code has some manner for handling async texture uploads, the code is quite honestly insane, and I don't think it's at all sound or should be used by anyone... (who on earth decided it would be reasonable to just shove OpenGL API calls into a queue and execute them randomly on the next frame?!)
You should really consider an architecture where the main render thread polls the asynchronous tasks for completion. Then, you can create the texture objects & upload the texture data from the main render thread.
If the problem only occurs with Sodium, that does not mean it is Sodium's fault. We added the option to use a "No Error Context" by default in Sodium 0.5, and that will cause any broken code (which may have silently failed or produced errors before) to instead probably crash, because the driver isn't checking if what you are doing is valid any more.
This happens on Fabric too, as seen in the image above.
Keep in mind that the above issue was also found on my Steam Deck, so it may lie with Linux and/or certain AMD GPUs' drivers. (My main PC is running Windows 10 and it has a GTX 1650.)
Okay, the problem really seems like it's to do with YACL, and for probably the reasons stated above. The asynchronous task which YACL creates for decoding textures ends up calling Minecraft texture code from off-thread, particularly here to allocate the texture object...
Minecraft tries to make this work, but like I said, their implementation is catastrophically broken, and it causes a nightmare of issues all over the codebase which it doesn't handle correctly either. You can't reliably expect it to work unfortunately and refactoring your code to avoid it looks like a non-trivial amount of work.
Thanks for vetting my code! Looks like I'll rewrite this class later today with some kind of queue system.
As Intel drivers don't seem to care, I cannot repeat this issue on my machine. However, I believe I have successfully fixed it by implementing @jellysquid3's suggestions.
For anyone having this issue, please run the CI build found here.
And to @jellysquid3, if you have the time to look over my changes, that'd be great!
I think your solution will solve the problem. 👍 I would still caution against using Java AWT code (i.e. BufferedImage and Graphics2D) since most of it doesn't work correctly with LWJGL... however, it seems like you aren't having any issues, so I'm not sure what's going on there.
I think your solution will solve the problem. 👍 I would still caution against using Java AWT code (i.e. BufferedImage and Graphics2D) since most of it doesn't work correctly with LWJGL... however, it seems like you aren't having any issues, so I'm not sure what's going on there.
I'm yet to find a better pure-java WebP reader. If you know of any, let me know and I'll try to switch over. I'm not very happy about using AWT either.