JVM crashes with EXCEPTION_ACCESS_VIOLATION During world gen.
HarroldSaxon opened this issue ยท 54 comments
I'm getting this crash while generating chunks in a server and it seems to be OSV doing it. I've tried different versions of forge, java, different pc, still get the same crash while generating new chunks at random. I only get it with several other mods installed, but can't find a specific incompatible mod. Any help with this issue would be greatly appreciated.
hs_err_pid8176.log
debug-1.log
2022-01-12-1.log
I did some thinking about this and realized that a maximum of one interceptor per type is not guaranteed. I see this as potentially a massive issue, which I will want to fix right away. Thanks for bringing that to my attention.
Let's analyze the architecture of OSV's InterceptorDispatcher
. OSV maintains a separate pool of interceptors per-thread. This is so that multiple blocks can be processed between threads and each interceptor will be responsible for its own transformations to some independent Level
.
Ordinarily, this design has a relatively small memory footprint. This is because, in vanilla (or with no other mods), only the render thread and a single worker thread are ever responsible for block processing. In my testing, this means a total of 5 interceptors should exist under normal circumstances (or just 2 if the world has already generated).
However, when other mods are introduced, this environment changes. Forge spawns a massive number of threads (imo too many) and it makes sense that with some mods being installed, more and more of those threads might trigger a new interceptor to spawn. I would like to see this number never exceed 10 interceptors in total.
My plan to achieve this is to generate a single interceptor per type (across threads), with multiple handles (containing block references, etc) per thread. To be extra safe, these handles should probably be evicted immediately once they've been used. But that's a really significant change and I'll need a couple days to figure it out.
In the meantime, I can really use your help taking some diagnostics, since it's very difficult for me to test in heavily-modded environments.
TL;DR: Please try this build out and share any info at all. It probably won't fix anything, but it will throw some useful log spam in there. Thanks in advance for your help.
Edit: since we're here, I will introduce a compatibility mode with JRE 9+ in the next version, but it will be off by default. Essentially, the problem with adding compatibility for this is that it completely rips out all of the dynamic block behaviors provided by the mod. This would essentially defeat the purpose of using the mod in the first place. In any case, since we're already here, now's a great time.
Forge and OSV (1.16) are not compatible with Java 11, only Java 8. This will change again in 1.18, where both mods will only be compatible with Java 18+. This is a limitation of the mod and not necessarily a bug. As a result, I will not fix it. For a workaround, you'll have to downgrade to Java 8.
For reference, this is because OSV's ore blocks dynamically generate their behavior from other blocks. This is accomplished by placing other blocks into a fake world and transforming their outputs into the normal world.
In the future, I will consider providing a compatibility mode, but that is very low priority for the time being. Leaving this open for visibility.
I tried java 8 first. as I previously stated, I did try multiple configurations over the last few days.
here's the logs from java 8
hs_err_pid1724.log
OSV uses a library that was renamed in Java 9. Give it another try with Java 8 and share the crash report when using that version.
Interesting. I've not hit any issues using AdoptOpenJDK, but I would hope it works with Oracle / Sun official, as well. Unfortunately, I don't see any OSV code at all in these reports. I also don't see enough of my world gen-related log statements at the end of the log to suggest that OSV is doing anything here. Without any more clues, this may take quite some time to diagnose. I won't be able to test with more than a few other mods due to the workload required, but l will update here in the next few days with any details I discover.
do you use openjdk8-hotspot, or openjdk8-openj9? I'll try running on the same jdk and see if it works.
the error is thrown by the garbage collection thread shortly after creating a region interceptor in every case, with higher odds of a crash the more world gen heavy mods I add in. with my current pack, it's about 1 crash in every 10~15 region interceptors created. If you have a more in depth debug output option, i can try to get more info too.
There should only be one interceptor for each of three types: ClientLevel
, ServerLevel
, and WorldGenRegion
at any given time. Additional types do get tracked, but that's supposed to be a failsafe to provide details about unsupported types. I'll setup a build with some extra debug statements and share that here in a few hours, if needed.
I've been trying to reproduce the crash for the last hour with no luck so far. it was crashing every 20 min or so before, but I have had it go a few hours before crashing from time to time when I was trying to solve this issue myself. I'll set a chunk pregenerator running overnight and get back to you with more information tomorrow.
Hey, can you share the log when you do? I set it up to keep a tally of how many interceptors there are. How I fix the problem will depend on that number.
Looks like the most recent interceptor was alive for a good 15 seconds before the crash. It would have logged before it started to create a new one, so it does kinda sound like this might not necessarily be OSV. I took a look at TerraForged earlier and it looks like they're also wrapping the world gen regions.
No idea. I'll give this a better look in the morning when I'm not on mobile.
So, it's been a long day, and i made a foolish mistake. I dropped the .zip file into my mods instead of the .jar. Now with the mod actually loading, i am getting crashes. sorry for the mix up.
2022-01-13-1.log
debug-1.log
hs_err_pid2800.log
this crash was during spawn area generation, but I'll set it to loop for a while like I planned to and see if I can get some better logs showing those multiple instances of interceptors for the overworld generator.
Hi there. I will run those. I also have a more interesting log here from running overnight where it got up to a whopping 77 interceptors before crashing
debug.log
latest.log
hs_err_pid6788.log
g
Hey, I really don't know what to do here since this log looks fine and everything is still working for me. The log you sent me indicated that no potentially unsafe code is failing, since the interceptor allocation is the only unsafe code that could ever cause a JVM crash, as far as I know (other than general memory issues).
I'm gonna attach 2 builds here. Would help a lot if you could try them both and share the logs.
OSV-Forge-7.4.4-copyFields.zip
OSV-Forge-7.4.4-noCopyFields.zip
Cool, thanks. This is great info. Sounds like one of these mods has changed the world generation code to generate not only in parallel, but in entirely new threads. OSV was not designed for this kind of environment and will need to be updated. I'll do that later today or tomorrow morning and update here when that's implemented. This is a pretty significant change, so I'll need some time and help testing that.
I'm running a few more attempts to be sure, but it looks like noCopyFields has actually stopped the JVM from crashing! Now minecraft is crashing and generating crash reports as it should, though it does hang without exiting, so I can't automate relaunching this version. I'll give you the logs and crash reports shortly.
copyField
latest.log
debug.log
hs_err_pid8796.log
s
Sounds like the crashes are from the missing data in the interceptor. So, we definitely need it. I bet this didn't fix the problem at all, but instead just introduced a new issue that occurs before the old one.
That's good information, though. I'll give this done careful analysis in a bit and start working on reducing the number of interceptors. I suspect the actual problem comes from basically the JVM running out of memory when doing manual allocations.
I'm just saying, though, it is a little frustrating. Ordinarily, this game would not continue continue spawning new threads for every new region like that. But anyway, I can accommodate and hopefully that will fix it.
I wonder if it might be structure generation triggering this, there are a few mods that split off generating there own structures into a separate thread if I recall, and I believe terraforged does this as well.
Hmm, that's a good thought. It might be. I guess it depends on whether those structures would ever interact with the ore variants. Hard to say. In any case, super hard to say at this point. Hopefully cutting the number down will eliminate the problem.
something interesting, I've managed to prevent the crash I was getting with noCopyFields by preventing block ticks. I normally do this for pregenerating worlds as it offers a large speed boost, but it also acts as a 'safe mode' for most ticking block/entity crashes. I'm attempting to reproduce the original JVM crash under these conditions. This setting did not prevent the JVM crash with OSV-Forge-7.4
So far, out of 18 instances running through spawn generation, there have been 0 crashes.
Cool, thanks.
This first exception, a NPE in TFChunkGenerator
, does not appear to be caused by OSV. This call chain is impossible:
at c.t.m.c.TFChunkGenerator.func_235957_b_(TFChunkGenerator.java:150)
at p.o.w.i.WorldGenRegionInterceptor.prime(WorldGenRegionInterceptor.java:115)
Because prime
just sets a value in the interceptor.
The second exception is the same, but somehow calls a different method:
at com.terraforged.mod.chunk.TFChunkGenerator.func_230354_a_(TFChunkGenerator.java:261)
at p.o.w.i.WorldGenRegionInterceptor.prime(WorldGenRegionInterceptor.java:115)
I have no idea how either of these methods is getting called. Again, all this method is doing is updating a value.
The third error, We are asking for a chunk out of bounds
, is also not possible.
at n.m.w.g.WorldGenRegion.func_217353_a(WorldGenRegion.java:131)
at j.l.ThreadLocal.get(ThreadLocal.java:161)
at p.o.w.i.InterceptorAccessor.getHandle(InterceptorAccessor.java:21)}
Once again, I have no idea how this code is getting executed. InterceptorAccessor#getHandle
is just reading a value in thread local storage.
Based on that alone, I'm guessing the memory is somehow getting corrupted. But, I have no idea where that's happening, because again, everything is working just fine for me and the only thing I haven't tried at this point is TerraForged.
Unfortunately, without these issues occurring for me, it's entirely possible that I'm just making things worse by changing well-tested code that was previously working fine. I'll make a few test changes in a few and upload them here for testing. But after this, I'm currently out of ideas. It may not be worth redesigning this entire mod just to provide compatibility with TerraForged.
I agree. I wish I could take a look under the hood of terraforged, as 99% of the crashes are with that mod. the same crash without it is extremely rare. so whatever is causing the original issue, terraforged is doing it a lot more than any other mod. So far, 7.4.4-noCopyFields is the only version that doesn't crash at all with the right settings. The only thing i can think of is either some mods are doing something very strange with the feilds, making them corrupt when copied, or the intercepter is hooking into a class that isn't actually what it thinks it is, and getting bad, even impossible data back.
Yes, definitely memory corruption. Looking at the fourth error about ServerChunkProvider cannot be cast to InterceptorHandle
, this data cannot possibly be a ServerChunkProvider
. I've reimplemented this "copy fields" method again. Just a shot in the dark. It has to be done and maybe this way will be a little safer.
Let's hope this one does it!
In any case, 7.4.4-noCopyFields provides a work around. If you could add a config setting to turn copying on or off, it'd be more than enough for compatibility.
7.4.7 didn't crash, but hangs on copying fields into RegionFix. I'll see if I can get past this point, but it'll be a few hours before I can try more on this.
debug.log
latest.log
Thanks! I think that just told me what the problem has been this whole time.
It's actually copying out of RegionFix
, but some of the fields in RegionFix
aren't in WorldGenRegionInterceptor
, so it can't copy all of them. I have no idea why it's hanging, but it should fail and I understand now why OSV never did fail in the past, while still corrupting the memory as a result of this unexpected type.
Here's an updated build which correctly verifies that each field is available in the destination and the source. I have it logging each field that gets copied, because it should be unable to print the data if it's corrupted, thus confirming that it isn't.
I've also updated the field copying to use reflection, which should prevent the behavior from changing on different platforms. I hate this solution, but I also hate editing all of the bytecode so that fields can be copied normally, like I did in 6.2. For anyone else reading, we need to copy these fields because the Level
objects cannot be instantiated normally. This is because, for some reason, Mojang throws a whole bunch of init code in their levels which loads a bunch of entities, creates new save files, and does a whole bunch of other work. For an interceptor to be viable, I need a way to skip this work. And that can only be done--at least in 1.16--by dodging the constructor invocation. It's quite an extreme solution, but this is the only way to dynamically copy behavior from other blocks.
@HarroldSaxon Don't worry if you aren't able to test this, but if you are--as always--, that would be a huge help. I'll try it out as well in several hours with TerraForged to make sure we have it worked out.
Edit: Forgot to attach the file.
So, I've been testing and thinking, and I believe I may have a work around if not a solution. not getting the fields causes redstone ore variants to misbehave, but I've found a few solutions for that issue that I can do myself, and after generating enough of the world I can simply swap back to a version that does copy the data and redstone ores work perfectly again.
so, I think what you're doing is allowing different generators to make changes and fetch fields for each one? but what about copying these fields to the config just once, or if you want to keep this flexability, once for each dim? would this allow different generation rules in each dim while not requiring fetching fields for every generator thread? then regeneration of these fields could be set the same way as with resources so they aren't fetched again unless mods are changed? I've still yet to get a crash with these settings, so it is working. and I crash pretty quickly still when generating with any other version of osv. I don't know what these extra threads are doing to junk up the data you're copying, or if it's just the volume of data, perhaps even just coincidentally this change has averted the real cause, but this change has solved my issue.
Edit: as a side note, not copying fields has provided a dramatic performance increase, with average chunks per tick generated being doubled.
OSV does not intentionally copy fields more than once, ever. In fact, I carefully designed it to only copy fields when the world first loads because it's so expensive to create even one of them. This behavior is a side effect of other mods spawning additional threads to do their world gen work after the world has already been created, which I had not accounted for.
In a vacuum, there's nothing wrong with other mods upping the parallel threads to do world gen. The only problem with it is that it changes the game in critical and unpredictable ways. OSV is complex enough that sometimes I depend on that behavior.
Essentially, the dispatcher was setup to allow one set of interceptors per thread, but this was only designed to accommodate the render thread and main thread each doing their own, independent block processing.
Don't get me wrong, it's not ideal for other mods to continue spawning hundreds of new threads and never reuse them, but OSV should still be able to accommodate that.
Still no idea what the problem is. All I know is that there should never, ever be more than 5 of these things. So, I'll start by fixing that and we'll see where that gets us for now.
Okay, sounds good, and thank you for communicating with me so much. I really appreciate it.
Hey, I got the new model all setup. This was quite the change. Everything is working here, but before I can release this, I have to test it on other platforms and with several mods. This will definitely take awhile.
Do you mind helping me test this build as well? I'm not really sure if it's gonna fix the issue or not. All I know is that it will reduce the number of interceptors like this:
- One interceptor per original world (e.g. overworld / nether)
- Only one region interceptor total.
- Interceptor handles are generated for every single use, only in the current thread, and are immediately disposed. I'm hoping for negligible / no impact on performance.
Anything you can tell me about this build will be extremely helpful.
Thanks again for your time!
There is still a chance of crashing during server startup, when the interceptors are first created. 2 intercepters are being created on startup now.
type: class com.terraforged.mod.chunk.fix.RegionFix in thread: Thread[Worker-Main-11,5,SERVER]
and
type: class net.minecraft.world.server.ServerWorld in thread: Thread[Server thread,5,SERVER]
I've yet to see a crash once these are up, but would like to generate a good 1,000,000 chunks before calling it good. Unfortunately, although chunk generation starts off faster at about 0.7chunks a tick, it slows over time till generation stalls out completely at around 5000 chunks.
Don't count too much on performance with these builds. There's a lot of info logging as the world generates, which will definitely slow things down.
I also wonder what's going on with this RegionFix
type. I'm actually not getting any region interceptors, at all.
As for the slowdown over time, that sounds like a memory leak, but seeing as no additional interceptors are getting created over time, that should be pretty much impossible (at least for OSV). I'll set it up to track the number of interceptors to make sure they're getting all getting disposed over time.
If possible, are you able to test this without Terraforged? Maybe this is just a compatibility issue with Terraforge and I need to put more time into that mod specifically.
Edit: Confirmed that the handles are getting cleared whenever the GC runs, but it's still creating thousands of them. I'll see if somehow I can have it reuse them and know when to clear them out if there are too many
sure, I can test without Terraforged, but it'll make looking for crashes take much much longer. With only your mod and Terraforged i've yet to see a crash at all, and I had thought this was an issue with terraforged before they refuted it and I did eventually manage to recreate the crash without that mod. but it certainly makes world gen far more intensive and is often I think the last straw.
Updated to reuse old handles. These handles are never cleared manually. They should still get removed as the old threads die off. And, even if they don't, we were only up to 77 interceptors before. Assuming we create the same amount of handles overnight, that's an extremely small memory footprint. These objects just track the block being replaced, its target block, and some coordinates.
Thanks again for your spectacular help. If you ever join the Discord server, just let me know it's you and I'll give you a fancy green name.
Edit: this build also drastically reduces the amount of debug logs. You should still be able to search for total interceptor
and find the number of interceptors / handles.
I'm running a loop of generating new worlds till one crashes without Terraforged on one of my pcs. on another, I'm trying to figure out what's going on with terraforged.mod.chunk.fix.RegionFix it looks to be keeping chunks loaded after they report as fully generated? This also appears to be where the slowdown is happening as well. And then despite not generating any more chunks, features generating keep trickling in, so it may be that that's what this is handling. I suppose you could try having the interceptor ignore them completely, and I could see what breaks if anything.
Edit: just saw your last post! I'll give it a try
Yeah, it will just create some minor generation bugs with some ore variants, depending on the block used. I'll consider that, but I'd rather the Terraforged devs look into the issue instead.
Just curious, how are you tracking which chunks are loaded?
chunk pregenerat
has some nice features hidden away for that
I'm going to bed after this update, but it seems 7.4.6 does run better when it runs, but also breaks a bunch of moded structures, and crashes much more often than previous versions. It always crashes with terraforged now, even if just osv and terraforged are present, which didn't happen before, and I've gotten a few crashes without it now too. I have a lot of debug logs, so just tell me what situations you'd like to look at if any. I'll try to find the right ones in the morning if you do.
Just share any crash reports. How are modded structures breaking? What kind of breaking?
Critical error detected whilst generating Features
crash-2022-01-16_00.39.30-server.txt
crash-2022-01-16_00.19.03-server.txt
crash-2022-01-15_23.38.38-server.txt
2022-01-16-8.log
I can't find the other two associated logs of this error, but i'll keep looking. i have something like 80 log files, but will try to just post a few of each type of crash
This is what i meant by breaking the
debug-3.log
2022-01-15-3.log
m
killed by watchdog after hanging for 10 minuets during startup
debug.log
latest.log
ServerChunkProvider cannot be cast to personthecat.osv
crash-2022-01-15_22.50.28-server.txt
2022-01-15-3.log
debug-3.log
I have a lot more, but they're really just these same things over and over, and honestly I should have organized them better in the first place, but I was busy and neglected to do so.
jvm crash Without terraforged
debug.log
latest.log
some with terraforged
2022-01-15-1.log
[debug-1.log
debug-1.log
2022-01-16-2.log
](https://github.com/PersonTheCat/OreStoneVariants/files/7877316/debug-1.log)
2022-01-15-2.log
[debug-2.log](https:
debug-3.log
2022-01-16-3.log
//github.com/PersonTheCat/OreStoneVariants/files/7877318/debug-2.log)
I've generated 5 new worlds, everything is running smooth as silk so far. I'm generating a large area of chunks overnight as a stress test. so far it looks to be running really well, a roughly 20% increase in chunks/tick processed over v7.4, which was unexpected but welcome to see.
This is really great news. And I most likely could not have done it without your help, so thanks again for that. I think the memory corruption and number of interceptors were two separate issues, so this is a really great place to be in. We also introduced a compatibility mode where OSV will simply not intercept anything if it it can't generate an interceptor. I'll clean the code up and continue testing on other Java platforms. ๐
fantastic! I'm glad I could be of help, and please let me know if there's anything else you need!
Also, do you have a donation method? I really appreciate everything you've done, and would like to buy you a good beer or bottle of wine or something if you do.