Huge geolosys.dat file taking ages to write
Naxdar opened this issue · 12 comments
Versions:
- Minecraft Forge: 1.12.2-14.23.4.2705
- Geolosys: 1.12.2-2.1.2
- JourneyMap (optional): true
- ImmersiveEngineering (optional): false
What happens:
With the map on my server getting big I noticed the geolosys.dat file was getting pretty large (>50Mo compared to the ~400 Mo world map). The server gets major freezes when people generate new chunks and meanwhile, i see the geolosys.dat file being at 0 Ko (meaning it must be being overwritten) and getting back at ~50Mo once the freeze passes. I suspect the rewriting of that oversized file to be causing these freezes.
Deleting that file doesn't seem to make the mod unhappy and make the freezes go away for a while.
The file seems to contain the position of every single ore generated by the mod. Is it possible to get a config option to disable it?
What should happen:
No ~20 seconds freezes.
Logs (if necessary):
No error/warning in the console.
Additional Comments:
Great mod btw.
A. I can definitely optimize it I think... I’ll see if I can only overwrite changes
B. Config option is a completely do-able fallback plan!
I actually couldn't come up with any more efficient algorithms than what was already in place, so I just went ahead and added a config option for it.
Fortunately you can delete it (when the server is closed). I would advise you to do it since you should be getting >1 minute server freezes.
It's nice to have an (incoming) option but there isn't really a choice, it doesn't take that much exploration to get a >50Mo file and once you get to that size, playing with 20 sec freezes when someone is generating new chunks (or mines ores?) kills the game.
Yeah.
A quick look at it suggests the problem is that the data structure is a single file containing everything.
Observation: You don't actually need to change the data that much, or store all of the data. You don't need a record of every block of ore. You need a record of the coordinates within the block that had ore chunks, because when the chunk is loaded, scanning that hunk of space for the ore in question will be nearly-instantaneous. So you could just record the min/max x/y/z for each ore in each chunk, and scan them occasionally, possibly tying that to chunk saving, or active use of prospector's pick. Then if the scan comes up empty, you can drop that record entirely. So if you have 100+ blocks, instead of 100+ data items, you have six -- and because they're coords within a chunk, they're all single-byte values.
That, and store them with some kind of locality, similar to regions, so you don't have to rewrite the entire thing every time even one block changes. As long as the actual check uses live data, when it needs to happen at all, you don't have to worry about the possibility of the data being old; worst case, it's old, you think there might be ore, you scan a few hundred blocks, there's no ore, you update your table. And then, at some future point, write an updated region table.
It looks like that config option isn't in any released versions, and I can confirm the same behavior. I think the efficiency question may be the wrong question; the problem is that, as the amount of world generated increases, that file gets huge, but the entire file is rewritten every time anything happens. So if we have several players, and they have been out exploring a bit:
$ ls -l geolosys.dat
-rw-r--r-- 1 seebs seebs 175137001 Sep 14 20:56 geolosys.dat
Now any time anyone mines even one geolosys ore, the server gets stalled out on rewriting a 175MB file. And I'm not sure, but it looks like if someone mines two consecutive geolosys ores, the file gets rewritten twice.
It might make sense to use a format other than NBT for that, because NBT is an incredibly bad format for a potentially large database where individual entries might be changed indepdentently of each other. The performance problem is coming from the fact that, if you mine a single ore in a single chunk, several thousand chunks of data have to get rewritten.
(I don't know what the ideal format would be, but I'd guess something like a directory containing multiple data files, each covering a region or so, would probably be significantly faster.) But it's also worth noting that, in our case, the geolosys.dat file is nearly 50% of the size of the entire overworld world's region data...
I still have this issue regularly on my server, and LagGoggles tells me that it is definitely Geolosys responsible. It appears that expanding the size of ore deposits, despite decreasing their frequency, as I did, DRASTICALLY exacerbates the issue.
Is the best fix for now to delete that file and/or change that config option?
Love this mod btw.
In practice, change the config option.
The source fix would honestly be "don't store that data at all", because there's simply no need to store that data; you can check a given chunk for ore samples when the pick is used, which will take possibly almost an entire millisecond, and then cache that result until the chunk is unloaded. There's no need for storage at all, really, because the actual test ("check this chunk for ores") is trivial and easy to perform extremely quickly.
The problem is that, instead, we've got a file which can be >100MB compressed, which is being rewritten from scratch every time, because NBT doesn't really allow dynamic access and selective updates.
@hendersont2 deleting the file will do nothing in the long-run; you'll still be saving the file (it'll just make a new one that starts with new chunks from that point on)
I guess I could go about it the scan-per-chunk way instead like @seebs suggested. I always figured scanning a 16x16x255 (65280 blocks) area would be bad, but I don't suppose it is as long as I optimize it like:
if (block == Geolosys.ORE || block == Geolosys.ORE_VANILLA) {
<tell client playing>
<drop out of loop>
}
This would mean that most iterations would just be x++
/y++
/z++
So now, opinion time: can I get your opinion on this poll?
You might be able to create a file with that name that the user running the server doesn't have permission to open or interact with. It might just suck the errors up, it might fail badly, who knows!
You don't have to scan the x255 chunk. stash the height range for the chunk, or just use a cap based on the highest height you can generate at. But seriously, scanning 65k blocks in a chunk for whether they're geolosys blocks? That should take almost no time. It would be significantly faster than writing even just one chunk's save data. Also, you can probably just do the scan client-side, it's not a big problem if the data's slightly stale. The client should always have the chunk the player is currently in loaded. And a few milliseconds is nothing for local client activity that's in response to user actions.
@seebs Client-side-only would fail when using plugins that hide ores unless one of the sides is visible etc.