
Backup purging hangs when using uncompressed differentials / incrementals
Jadan1213 opened this issue · 98 comments
I've noticed a few times that i have to manually re-enable chunk saving after a backup. i have no other mods that manipulate chunk saving installed.
I'm on forge 47.2.17, MC 1.20.1, mod ver. 3.4. I have Minecraft running as a local server on another machine, and my wife and I connect to it from our client machines.
I'm not 100% sure if it only happens when a save is done out of game (disconnected), but i feel like it is. I've noticed usually after logging back in after being off the server a while (no players) that if a backup was made, i have to manually re-enable saving with /save-on. This doesn't seem to happen when actually connected to the server, but i'm going to try and keep an eye out and see.
Added a secondary VM disk to my docker VM, and bind mounted the local storage directory into the docker container. I set the permissions for read/write and copied over all the world backup data and changed the backup directory to the local vm storage.
so far i ran the first test with purge size 100GB. the backup did not complete properly. I'm starting to wonder if this is an issue with having a backup folder that is outside of the game's data folder. anything other than ./backups. (I mean it could still be because it's in a docker container...)
I'm going to copy the game backups into the ./backups folder and see what happens a little later.
copied the world backups to the ./backups folder, set the size limit to 50G (limited space in the VM) and ran a backup.
it ran the backup, i haven't gotten any completion message, nor a re-enabling of saving. It is pegging a single cpu core at 100%, but spark profiler isn't showing any activity from advancedbackups. i'm going to let it sit for a while and see what happens.
i did discover if you move/delete the folder location and a backup tries to run, it will output a null error, but even after changing the directory location and doing a reload-config it still thinks a backup is running. (i didn't shut down the server when i did it lol.)
well i let it sit, and nothin. so now i'm more confused. i'm wondering if the only reason i didn't notice this at first is because it takes time for space to get used and hit the 100G limit i had set.
Hmm. I know that external backup locations are no problem outside of docker containers, but it's still possible that docker interferes somehow..
As for the spark profiler, it won't show anything with default arguments.
Adding --thread *
will tell it to profile all threads, as the backup process is threaded to reduce impact on the game loop.
Hmm, I've been trying to recreate this to no avail.
Tested steps so far:
- Run a debian container, install Forge, Advanced Backups and Spark
- Set the
count
purge strategy to2
- Make several backups, such that purging happens
- Change the backup location to be absolute (in this case, I used
/backups
rather than./backups
) - Make several more backups
With both backup locations, I observed proper count-based purging.
As such, do you think you could get me a download link for your backups folder, provided it isn't too big?
Ah! Managed to reproduce it, only after disabling compression like you did. Interesting...
aaand reproduced outside of docker.
With that, I think I now have enough information to test a fix, we shall see.
Full reproduction steps:
- Run a server, with Advanced Backups installed
- Ensure you are using differential or incremental backups. The default is differential.
- Disable chain compression.
- Make enough backups to hit one of your purge strategies. Count is easiest for obvious reasons
- Observe the backup deadlocking when it tries to purge.
I'll run some tests on my end, and come up with a fix for you to try out too if that's okay?
AdvancedBackups-forge-1.20-3.4-hangfix.zip
And here's a jar to test. This should solve it.
i'll download it and give it a try! That's crazy the rabbit hole we went down and how it was related to not using compression!
it's going to take me a few, gotta undo a lot of changes i made to reset everything to how i originally had it lol.
i use no compression because it's faster for backup (nvme storage), and my storage location runs on zfs, so it has compression already lol. I bet mostly everyone keeps it default to using compression and that's why this never came up!
Also, i did use the default with compression on at first!, I only disabled it after setting the storage to my smb share!
I do indeed want to implement purging notifications, though it's lower priority than some other things.
I think pushing that as a separate feature request should be a good idea though, rather than this issue.
I'll mark this as fixed in next release, and I'd suggest you continue using the test jar I gave you until v3.5 is out so your purging can work as it should.
I set the limit to 75G and ran a backup with the new jar. It deleted the old backup chain and brought it under limit.
would it be possible to get log messages on purge? like, backup size has exceeded limit of size
, deleting old backups.
the following backups were deleted;
backup1 backup2 backup3
etc? if not, no problem, purging seems to work now =)
Thank you for your help. Took lots of testing, but i'm glad you figured it out! I'll keep using the jar you sent me until a proper 3.5 release is posted. Thank you again for your time!
i have backups set to run with player activity.. They seem to run fine after logging in and playing for a bit, i haven't noticed anything else. but i'll keep an eye out. the logs don't show anything except
[22:32:22] [Server thread/WARN]:
SAVING DISABLED - PREPARING FOR BACKUP!
[22:32:23] [Server thread/WARN]:
SAVE COMPLETE - PREPARING FOR BACKUP!
[22:32:23] [AB Active Backup Thread/INFO]: Preparing differential backup name: backup_2024-03-11_22-32-23
Are you able to provide the entire log please? There should be info after that, whether its an error or the complete message..?
Only making one backup afterward logging off is indeed correct if you have player activity required, so that's not really a potential investigation route.
i have backups set to run with player activity.. They seem to run fine after logging in and playing for a bit, i haven't noticed anything else. but i'll keep an eye out. the logs don't show anything except
[22:32:22] [Server thread/WARN]:
SAVING DISABLED - PREPARING FOR BACKUP!
[22:32:23] [Server thread/WARN]:
SAVE COMPLETE - PREPARING FOR BACKUP!
[22:32:23] [AB Active Backup Thread/INFO]: Preparing differential backup name: backup_2024-03-11_22-32-23
it does seem to do a backup still after we log out, but i don't notice it doing further backups until after we log back in again
Ah, also - immediately after you next encounter this, could you please retrieve a latest.log?
Thanks for the report!
First up, do you have backups set to require activity?
Secondly, are you able to tell if your backup cycle continues as normal after this?
i'm wondering if there may be an issue with the space checking for rotating backups. my server is running in a docker container with a folder mounted from my NAS to hold the backups, and i don't think that it's able to detect storage properly. Could this be causing an issue if it's attempting to enumerate the storage and delete old backups? maybe causing it to error out or cause it to not be able to complete its tasks prior to re-enabling saving?
when i get chance i'll disable the deletion of old backups and see if it changes anything.
just completed a backup while in game. it doesn't show in the server log output that the backup was completed, and it did not re-enable saving.
i just rebooted my server after removing a mod, so as soon as it does it again i'll post the latest.log
here's the one before that. (i'm not sure how long saving was off... still looking...
https://gist.github.com/Jadan1213/5290103cf77db582ee45e5b5041815dd
this is the latest.log from a couple reboots ago.
https://gist.github.com/Jadan1213/2b582c51f40ba46e8f4055e94f2092c5
it's a mod that pauses season progression when players aren't logged in. it's called ready player fun
Pausing server onPlayerLogout at 40827476, 551547
This, from my knowledge, isn't a vanilla feature.. I think I might know the cause now, but we shall see.
Are you playing a premade pack, or did you install all those mods manually? I might need to create an identical environment if I can't reproduce this manually.
i'm playing on my own modpack that i put together. If you have modrinth i can send you an exported pack file that you could import into modrinth.
i'm not sure if the issue would happen in a single player world or not. since i have it running on a dedicated server. I could zip up the server folder without the region files (so it's not huge lol) if you want
Hmm. After some testing, I've been unable to reproduce it indev with readyplayerfun.
As such, yes I'd like to use a modrinth export, and a copy of your server (world folder not needed), thank you in advance!
ok i ran around for a bit. logged out, and waited. it did the save with players logged out. and when i logged back in, i had to re-enable saving. Here's the latest.log
https://gist.github.com/Jadan1213/c0ad8de3c93d5d309e95c6249780dad3
ok give me a little bit to get those files together. my wife is currently playing on the server lol.
well if you know how to enable the debug log on server, please tell me lol. the server doesn't output one and i'm not sure how to turn it on!
No major need for a logs folder, no. I think the logs you've provided will give all the info the logs folder can, aside from perhaps a debug log.
there is not. i've been trying to figure out how to enable it to try and figure out a few other things with some mods. =(
There should be a debug.log file in the logs folder - but if there isn't, just don't worry about it.
Here's the modrinth export(152.7MB): https://drive.google.com/file/d/1-0w49fkJlyoqjOQFIYSL4ld_rKgu0pam/view?usp=drive_link
Here's the server directory sans world folder(945.2MB): https://drive.google.com/file/d/1BwIt_0IalhHwwM91mzTQuzQsjiGVKfE-/view?usp=drive_link
The server is set to use xms & xmx of 12GB, i don't know how much ram you have available, you might need to alter that.
The client should use whatever your modrinth default ram limit is. I use 6GB and it runs okay.
You might have to disable the StayTrue texture pack, i know it gives my wife's pc some issues loading sometimes. so if you run into problems loading the client, that could help.
The files are fairly large as I have a "few" mods installed lol...
Interesting.. That's a good thought. I've been unable to reproduce it with the server you sent so far, though I've not even managed to boot the client yet.
Those checks aren't ran if you run a snapshot with /backup snapshot
, so that could be a good point to investigate.
Yeah, both automatic and manual backups will run the checks, so if you're right in thinking it could be getting stuck on those, a manual snapshot would be a way to test that without disabling the checks.
Of course, the other option is setting all three config values to 0, which disables the checks outright.
i did a /backup snapshot and it gave me an error. looks like the bakcup is hanging.
"Cannot start a snapshot whilst a backup is already running"
Ah, that's not a surprise I guess. I'm not 100% sure you can break out of that without a server restart, but you can try /backup cancel
, waiting a short while then running the snapshot again.
and indeed, I can't reproduce it still.. Server is running on my pc using my NAS as storage, which is the slowest storage I have available - that's made no difference, and nothing I do on my client makes any difference.
Hmm. I'd need to get a debugger attached to a server running inside a docker container to actually figure out a fix.
However, I can potentially implement a detection system for the time being...?
It has to be something related to the fact that my NAS storage is being mounted into a docker container. I feel like it's not able to enumerate space usage and it's hanging. Since you can't reproduce it, i feel like it's an infrastructure thing and not a server configuration/mod thing.
unless you happen to have any ideas on how to prevent the hang lol. You're the coding master, not me! lol
That has to be exactly it. After setting space purging to 0, the standard /backup start completed and re-enabled saving. You said that a /backup start would do space checks so i think that's it. Given that my use case is pretty niche (but possibly an issue others could run into) maybe adding a comment in the config that says something like "If you are mounting your backup storage into a docker container, space based purging should be set to 0 to prevent a possible issue with the backup hanging and not re-enabling chunk saving"
well i happen to have a server running in a docker container and am willing to help if you want me to install/run anything on my server.
AdvancedBackups-forge-1.20-3.4-dockercheck.zip
Give this one a test please - you'll have to disable the old jar.
It should be able to check if a server is running inside docker, and if so, skip the purge checks.
Thanks github...
AdvancedBackups-forge-1.20-3.4-dockercheck.zip
I guess I'll try reuploading it? If not then I'll have to use something else to upload the file, strange. I've not had issues files to github like that before.
Oh, that's a thought. Rather than dming you, I can just send a link to a file uploaded to discord haha
AdvancedBackups-forge-1.20-3.4-dockercheck.zip
That link will work for 24 hours.
yeah my server did a backup with no players and it completed perfectly. this is definitely the problem. downloaded the file.
ok i reloaded the config after setting purge size to 100, and did a /backup reload-config
running /backup start now
I don't see a need - with how similar they are, I'd be very surprised if the hang doesn't occur on all three.
The check would also work on all three, so no need to do any extra checks there.
This will ideally be a temporary fix, I'm not happy with just relying on this check - but it's better than the backup hanging.
i'm 99% sure that it works properly in a container when the world backup storage location is located on the local vm storage and not through a cifs/smb mount. (like i said, i'm probably a very niche use case)
When i was running with the backup storage inside the vm/docker container, i never ran into the issue. I only started noticing old backups not being purged AFTER i moved the world backup storage to a cifs bind mount.
There's not really a way for me to check whether the backup location is over the network or not.
And come to think of it, it's possible more scenarios use docker without the smb / cifs setup. I'm not too sure where to proceed now, actually...
i'm willing to bet if i moved the storage back inside the VM (my docker host is a VM on TrueNAS scale), that the backups would complete properly.
It's not possible to query the system via mount and grep the output to look for the world backup string and see if it's on a cifs mount?
mount | grep <$backupfolder> | grep cifs
something like that?
i do get output within the docker container when doing that command. not sure if java has any access to that. it's with user permissions, no root.
`minecraft@dc80e5520122:~$ mount | grep /minecraft | grep cifs
//192.168.69.1/Minecraft on /minecraft type cifs (rw,relatime,vers=3.0,cache=strict,username=minecraft,uid=1000,noforceuid,gid=1000,noforcegid,addr=192.168.69.1,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=1)
minecraft@dc80e5520122:~$`
this is assuming that the AB config path is set to the same mount name.
for example, my directory is //192.168.69.1/Minecraft
mounted to /minecraft in the container.
maybe have it check if it's in a container, then if it is, check if the backup directory is local storage or not? if it's local, do the checks, if it's not listed locally then skip checks? (i'm just throwin spaghetti at the wall here lol.)
Hmm. I can have a play tomorrow, I'm just not sure if I can come up with anything reliable enough.
If you enter the full backup path (whatever so exactly what you specified in the config) with that command, does the output change much?
typing the full path from the config: config.advancedbackups.path=/minecraft/world-backups/aubincraft-1-20-1
wouldn't get a result since the mount command doesn't show file structure, only what server share is mounted and to what directory.
Or maybe, add a config option for docker containers specifically.
config.advancedbackups.isdockerstoragelocal=true/false
where if someone has their backup storage local, they can set it to true manually, and it will do the checks. or if it's not local but mounted via cifs bind, it can be set to false and skip the checks.
mmm, figures. not really sure what to do here then - whether or not I should allow backup purging when in docker containers, how i could reliably check for a network drive, and ideally how I can fix this issue outright.
A config option that needs to be enabled to allow purging within a docker container definitely seems to be a good temporary solution though.
and have a log output saying
ERROR: Server is running in docker, storage checks will be skipped unless isdockerstoragelocal=true. do NOT set this to true if your backup storage is via a network bind mount as the backup will hang and the world chunk saving could fail to be re-enabled
don't need people blaming you for their worlds getting messed up if they toggle stuff they don't know on and off lol
that way it's up to us as the user to know what we're doing, and that liability doesn't fall on you in the event a permanent solution to this is impossible.
i'm going to enable the other checks with the original jar and see if things hang or not. that way you can know for sure if this issue only affects space based purging or not.
Alright, thank you. I'm sure it'd affect all, but you're welcome to check.
so i have this right...
config.advancedbackups.purge.days=0
is the number of days old before the backup is purged
config.advancedbackups.purge.count=0
is the number of total backups to keep
where does this fit in
config.advancedbackups.chains.length=50
The chain length controls the maximum amount of backups that can be in a single differential or incremental "chain" - that is, one full backup and 49 partial backups.
In a lot of cases you might not ever see this number hit, due to the "smart reset" option anyway.
It won't matter in the context of this bug.
ok i'm going to copy off my world backup directory and start playing with the settings. i'll let you know if i figure anything out.
so far i'm testing the original jar with the purge after # of days setting. The backup completes properly, but i'm noticing it's not purging the older backups, even with days = 1.
moving on to the next setting
testing with the purge after # of backups does not complete and re-enable saving, nor does it remove old backups.
wanted to also let you know, by fixing this issue you have corrected a perplexing issue i was having where the server was not saving/shutting down in a timely manner and was being force stopped, causing loss of some changes. It looks like this issue with the thread hanging was preventing it from working right! so double thanks!
Yeah, those shutdown hangs make sense too - the server can't stop whilst a backup is running.
I'm reopening this issue until 3.5 is pushed out, at which point it'll automatically be closed.
This issue has been closed - fixed in 3.5 which has now been pushed to Curseforge and Modrinth.