Advanced Backups

Advanced Backups

1M Downloads

Backup purging hangs when using uncompressed differentials / incrementals

Jadan1213 opened this issue · 98 comments

commented

I've noticed a few times that i have to manually re-enable chunk saving after a backup. i have no other mods that manipulate chunk saving installed.

I'm on forge 47.2.17, MC 1.20.1, mod ver. 3.4. I have Minecraft running as a local server on another machine, and my wife and I connect to it from our client machines.

I'm not 100% sure if it only happens when a save is done out of game (disconnected), but i feel like it is. I've noticed usually after logging back in after being off the server a while (no players) that if a backup was made, i have to manually re-enable saving with /save-on. This doesn't seem to happen when actually connected to the server, but i'm going to try and keep an eye out and see.

commented

Added a secondary VM disk to my docker VM, and bind mounted the local storage directory into the docker container. I set the permissions for read/write and copied over all the world backup data and changed the backup directory to the local vm storage.

so far i ran the first test with purge size 100GB. the backup did not complete properly. I'm starting to wonder if this is an issue with having a backup folder that is outside of the game's data folder. anything other than ./backups. (I mean it could still be because it's in a docker container...)

I'm going to copy the game backups into the ./backups folder and see what happens a little later.

commented

copied the world backups to the ./backups folder, set the size limit to 50G (limited space in the VM) and ran a backup.

it ran the backup, i haven't gotten any completion message, nor a re-enabling of saving. It is pegging a single cpu core at 100%, but spark profiler isn't showing any activity from advancedbackups. i'm going to let it sit for a while and see what happens.

commented

i did discover if you move/delete the folder location and a backup tries to run, it will output a null error, but even after changing the directory location and doing a reload-config it still thinks a backup is running. (i didn't shut down the server when i did it lol.)

commented

well i let it sit, and nothin. so now i'm more confused. i'm wondering if the only reason i didn't notice this at first is because it takes time for space to get used and hit the 100G limit i had set.

commented

Hmm. I know that external backup locations are no problem outside of docker containers, but it's still possible that docker interferes somehow..

As for the spark profiler, it won't show anything with default arguments.
Adding --thread * will tell it to profile all threads, as the backup process is threaded to reduce impact on the game loop.

commented

Hmm, I've been trying to recreate this to no avail.

Tested steps so far:

  • Run a debian container, install Forge, Advanced Backups and Spark
  • Set the count purge strategy to 2
  • Make several backups, such that purging happens
  • Change the backup location to be absolute (in this case, I used /backups rather than ./backups)
  • Make several more backups

With both backup locations, I observed proper count-based purging.
As such, do you think you could get me a download link for your backups folder, provided it isn't too big?

commented

Ah! Managed to reproduce it, only after disabling compression like you did. Interesting...

commented

aaand reproduced outside of docker.
With that, I think I now have enough information to test a fix, we shall see.

Full reproduction steps:

  • Run a server, with Advanced Backups installed
  • Ensure you are using differential or incremental backups. The default is differential.
  • Disable chain compression.
  • Make enough backups to hit one of your purge strategies. Count is easiest for obvious reasons
  • Observe the backup deadlocking when it tries to purge.

I'll run some tests on my end, and come up with a fix for you to try out too if that's okay?

commented

AdvancedBackups-forge-1.20-3.4-hangfix.zip
And here's a jar to test. This should solve it.

commented

i'll download it and give it a try! That's crazy the rabbit hole we went down and how it was related to not using compression!

commented

it's going to take me a few, gotta undo a lot of changes i made to reset everything to how i originally had it lol.

commented

i use no compression because it's faster for backup (nvme storage), and my storage location runs on zfs, so it has compression already lol. I bet mostly everyone keeps it default to using compression and that's why this never came up!

Also, i did use the default with compression on at first!, I only disabled it after setting the storage to my smb share!

commented

I do indeed want to implement purging notifications, though it's lower priority than some other things.
I think pushing that as a separate feature request should be a good idea though, rather than this issue.

I'll mark this as fixed in next release, and I'd suggest you continue using the test jar I gave you until v3.5 is out so your purging can work as it should.

commented

I set the limit to 75G and ran a backup with the new jar. It deleted the old backup chain and brought it under limit.

would it be possible to get log messages on purge? like, backup size has exceeded limit of size, deleting old backups.
the following backups were deleted;
backup1 backup2 backup3

etc? if not, no problem, purging seems to work now =)

commented

Thank you for your help. Took lots of testing, but i'm glad you figured it out! I'll keep using the jar you sent me until a proper 3.5 release is posted. Thank you again for your time!

commented

i have backups set to run with player activity.. They seem to run fine after logging in and playing for a bit, i haven't noticed anything else. but i'll keep an eye out. the logs don't show anything except

[22:32:22] [Server thread/WARN]:

SAVING DISABLED - PREPARING FOR BACKUP!

[22:32:23] [Server thread/WARN]:

SAVE COMPLETE - PREPARING FOR BACKUP!

[22:32:23] [AB Active Backup Thread/INFO]: Preparing differential backup name: backup_2024-03-11_22-32-23

Are you able to provide the entire log please? There should be info after that, whether its an error or the complete message..?

commented

Only making one backup afterward logging off is indeed correct if you have player activity required, so that's not really a potential investigation route.

commented

i have backups set to run with player activity.. They seem to run fine after logging in and playing for a bit, i haven't noticed anything else. but i'll keep an eye out. the logs don't show anything except

[22:32:22] [Server thread/WARN]:


SAVING DISABLED - PREPARING FOR BACKUP!


[22:32:23] [Server thread/WARN]:


SAVE COMPLETE - PREPARING FOR BACKUP!


[22:32:23] [AB Active Backup Thread/INFO]: Preparing differential backup name: backup_2024-03-11_22-32-23

commented

it does seem to do a backup still after we log out, but i don't notice it doing further backups until after we log back in again

commented

Ah, also - immediately after you next encounter this, could you please retrieve a latest.log?

commented

Thanks for the report!

First up, do you have backups set to require activity?

Secondly, are you able to tell if your backup cycle continues as normal after this?

commented

i'm wondering if there may be an issue with the space checking for rotating backups. my server is running in a docker container with a folder mounted from my NAS to hold the backups, and i don't think that it's able to detect storage properly. Could this be causing an issue if it's attempting to enumerate the storage and delete old backups? maybe causing it to error out or cause it to not be able to complete its tasks prior to re-enabling saving?

commented

when i get chance i'll disable the deletion of old backups and see if it changes anything.

commented

just completed a backup while in game. it doesn't show in the server log output that the backup was completed, and it did not re-enable saving.

commented

i just rebooted my server after removing a mod, so as soon as it does it again i'll post the latest.log

commented

here's the one before that. (i'm not sure how long saving was off... still looking...

https://gist.github.com/Jadan1213/5290103cf77db582ee45e5b5041815dd

commented

this is the latest.log from a couple reboots ago.

https://gist.github.com/Jadan1213/2b582c51f40ba46e8f4055e94f2092c5

commented

it's a mod that pauses season progression when players aren't logged in. it's called ready player fun

commented

the server still runs, it doesn't actually stop

commented

Pausing server onPlayerLogout at 40827476, 551547
This, from my knowledge, isn't a vanilla feature.. I think I might know the cause now, but we shall see.

Are you playing a premade pack, or did you install all those mods manually? I might need to create an identical environment if I can't reproduce this manually.

commented

i have advanced backups set to use system time and not game time

commented

i'm playing on my own modpack that i put together. If you have modrinth i can send you an exported pack file that you could import into modrinth.

commented

i'm not sure if the issue would happen in a single player world or not. since i have it running on a dedicated server. I could zip up the server folder without the region files (so it's not huge lol) if you want

commented

Hmm. After some testing, I've been unable to reproduce it indev with readyplayerfun.

As such, yes I'd like to use a modrinth export, and a copy of your server (world folder not needed), thank you in advance!

commented

ok i ran around for a bit. logged out, and waited. it did the save with players logged out. and when i logged back in, i had to re-enable saving. Here's the latest.log

https://gist.github.com/Jadan1213/c0ad8de3c93d5d309e95c6249780dad3

commented

ok give me a little bit to get those files together. my wife is currently playing on the server lol.

commented

do you want me to include all the logs folders?

commented

well if you know how to enable the debug log on server, please tell me lol. the server doesn't output one and i'm not sure how to turn it on!

commented

No major need for a logs folder, no. I think the logs you've provided will give all the info the logs folder can, aside from perhaps a debug log.

commented

there is not. i've been trying to figure out how to enable it to try and figure out a few other things with some mods. =(

commented

There should be a debug.log file in the logs folder - but if there isn't, just don't worry about it.

commented

Here's the modrinth export(152.7MB): https://drive.google.com/file/d/1-0w49fkJlyoqjOQFIYSL4ld_rKgu0pam/view?usp=drive_link
Here's the server directory sans world folder(945.2MB): https://drive.google.com/file/d/1BwIt_0IalhHwwM91mzTQuzQsjiGVKfE-/view?usp=drive_link

The server is set to use xms & xmx of 12GB, i don't know how much ram you have available, you might need to alter that.
The client should use whatever your modrinth default ram limit is. I use 6GB and it runs okay.

You might have to disable the StayTrue texture pack, i know it gives my wife's pc some issues loading sometimes. so if you run into problems loading the client, that could help.

The files are fairly large as I have a "few" mods installed lol...

commented

Interesting.. That's a good thought. I've been unable to reproduce it with the server you sent so far, though I've not even managed to boot the client yet.

Those checks aren't ran if you run a snapshot with /backup snapshot, so that could be a good point to investigate.

commented

i let it run on its own, i didn't do a manual backup

commented

Yeah, both automatic and manual backups will run the checks, so if you're right in thinking it could be getting stuck on those, a manual snapshot would be a way to test that without disabling the checks.

Of course, the other option is setting all three config values to 0, which disables the checks outright.

commented

i did a /backup snapshot and it gave me an error. looks like the bakcup is hanging.

"Cannot start a snapshot whilst a backup is already running"

commented

it did backup the files though, so that's good at least
image

commented

Ah, that's not a surprise I guess. I'm not 100% sure you can break out of that without a server restart, but you can try /backup cancel, waiting a short while then running the snapshot again.

commented

and indeed, I can't reproduce it still.. Server is running on my pc using my NAS as storage, which is the slowest storage I have available - that's made no difference, and nothing I do on my client makes any difference.

commented

Hmm. I'd need to get a debugger attached to a server running inside a docker container to actually figure out a fix.

However, I can potentially implement a detection system for the time being...?

commented

It has to be something related to the fact that my NAS storage is being mounted into a docker container. I feel like it's not able to enumerate space usage and it's hanging. Since you can't reproduce it, i feel like it's an infrastructure thing and not a server configuration/mod thing.

commented

so i just did a /backup snapshot and it re-enabled saving. I've set the space check to 0, so i'll try a normal backup and see what happens
image

commented

unless you happen to have any ideas on how to prevent the hang lol. You're the coding master, not me! lol

commented

That has to be exactly it. After setting space purging to 0, the standard /backup start completed and re-enabled saving. You said that a /backup start would do space checks so i think that's it. Given that my use case is pretty niche (but possibly an issue others could run into) maybe adding a comment in the config that says something like "If you are mounting your backup storage into a docker container, space based purging should be set to 0 to prevent a possible issue with the backup hanging and not re-enabling chunk saving"

image

commented

well i happen to have a server running in a docker container and am willing to help if you want me to install/run anything on my server.

commented

I'll get you a jar to test with, though keep the old one handy.

commented

no problem. just let me know what to do =D

commented

AdvancedBackups-forge-1.20-3.4-dockercheck.zip

Give this one a test please - you'll have to disable the old jar.
It should be able to check if a server is running inside docker, and if so, skip the purge checks.

commented

it gave me a 404 error

commented

if you want to send it to me on discord or use a google drive link, those work

commented

Thanks github...

AdvancedBackups-forge-1.20-3.4-dockercheck.zip

I guess I'll try reuploading it? If not then I'll have to use something else to upload the file, strange. I've not had issues files to github like that before.

commented

yeah, that's still a 404

commented

Oh, that's a thought. Rather than dming you, I can just send a link to a file uploaded to discord haha

AdvancedBackups-forge-1.20-3.4-dockercheck.zip

That link will work for 24 hours.

commented

yeah my server did a backup with no players and it completed perfectly. this is definitely the problem. downloaded the file.

commented

ok i reloaded the config after setting purge size to 100, and did a /backup reload-config
running /backup start now

commented

the backup completed successfully.
image

commented

would you like me to test if this issue occurs with the other purge options?

commented

on the original jars?

commented

I don't see a need - with how similar they are, I'd be very surprised if the hang doesn't occur on all three.

The check would also work on all three, so no need to do any extra checks there.

commented

This will ideally be a temporary fix, I'm not happy with just relying on this check - but it's better than the backup hanging.

commented

i'm 99% sure that it works properly in a container when the world backup storage location is located on the local vm storage and not through a cifs/smb mount. (like i said, i'm probably a very niche use case)

commented

When i was running with the backup storage inside the vm/docker container, i never ran into the issue. I only started noticing old backups not being purged AFTER i moved the world backup storage to a cifs bind mount.

commented

Huh. That is interesting...

commented

There's not really a way for me to check whether the backup location is over the network or not.

And come to think of it, it's possible more scenarios use docker without the smb / cifs setup. I'm not too sure where to proceed now, actually...

commented

i'm willing to bet if i moved the storage back inside the VM (my docker host is a VM on TrueNAS scale), that the backups would complete properly.

commented

It's not possible to query the system via mount and grep the output to look for the world backup string and see if it's on a cifs mount?

mount | grep <$backupfolder> | grep cifs

something like that?

commented

i do get output within the docker container when doing that command. not sure if java has any access to that. it's with user permissions, no root.

`minecraft@dc80e5520122:~$ mount | grep /minecraft | grep cifs

//192.168.69.1/Minecraft on /minecraft type cifs (rw,relatime,vers=3.0,cache=strict,username=minecraft,uid=1000,noforceuid,gid=1000,noforcegid,addr=192.168.69.1,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=1)

minecraft@dc80e5520122:~$`

commented

this is assuming that the AB config path is set to the same mount name.

for example, my directory is //192.168.69.1/Minecraft

mounted to /minecraft in the container.

commented

maybe have it check if it's in a container, then if it is, check if the backup directory is local storage or not? if it's local, do the checks, if it's not listed locally then skip checks? (i'm just throwin spaghetti at the wall here lol.)

commented

Hmm. I can have a play tomorrow, I'm just not sure if I can come up with anything reliable enough.

If you enter the full backup path (whatever so exactly what you specified in the config) with that command, does the output change much?

commented

typing the full path from the config: config.advancedbackups.path=/minecraft/world-backups/aubincraft-1-20-1 wouldn't get a result since the mount command doesn't show file structure, only what server share is mounted and to what directory.

commented

Or maybe, add a config option for docker containers specifically.

config.advancedbackups.isdockerstoragelocal=true/false

where if someone has their backup storage local, they can set it to true manually, and it will do the checks. or if it's not local but mounted via cifs bind, it can be set to false and skip the checks.

commented

mmm, figures. not really sure what to do here then - whether or not I should allow backup purging when in docker containers, how i could reliably check for a network drive, and ideally how I can fix this issue outright.

A config option that needs to be enabled to allow purging within a docker container definitely seems to be a good temporary solution though.

commented

and have a log output saying
ERROR: Server is running in docker, storage checks will be skipped unless isdockerstoragelocal=true. do NOT set this to true if your backup storage is via a network bind mount as the backup will hang and the world chunk saving could fail to be re-enabled

commented

don't need people blaming you for their worlds getting messed up if they toggle stuff they don't know on and off lol

commented

that way it's up to us as the user to know what we're doing, and that liability doesn't fall on you in the event a permanent solution to this is impossible.

commented

i'm going to enable the other checks with the original jar and see if things hang or not. that way you can know for sure if this issue only affects space based purging or not.

commented

Alright, thank you. I'm sure it'd affect all, but you're welcome to check.

commented

It's worth noting that size is checked, then total count, then dates.

commented

so i have this right...
config.advancedbackups.purge.days=0 is the number of days old before the backup is purged
config.advancedbackups.purge.count=0 is the number of total backups to keep

where does this fit in
config.advancedbackups.chains.length=50

commented

The chain length controls the maximum amount of backups that can be in a single differential or incremental "chain" - that is, one full backup and 49 partial backups.

In a lot of cases you might not ever see this number hit, due to the "smart reset" option anyway.

It won't matter in the context of this bug.

commented

ok i'm going to copy off my world backup directory and start playing with the settings. i'll let you know if i figure anything out.

commented

so far i'm testing the original jar with the purge after # of days setting. The backup completes properly, but i'm noticing it's not purging the older backups, even with days = 1.

moving on to the next setting

commented

testing with the purge after # of backups does not complete and re-enable saving, nor does it remove old backups.

commented

wanted to also let you know, by fixing this issue you have corrected a perplexing issue i was having where the server was not saving/shutting down in a timely manner and was being force stopped, causing loss of some changes. It looks like this issue with the thread hanging was preventing it from working right! so double thanks!

commented

Yeah, those shutdown hangs make sense too - the server can't stop whilst a backup is running.

I'm reopening this issue until 3.5 is pushed out, at which point it'll automatically be closed.

commented

This issue has been closed - fixed in 3.5 which has now been pushed to Curseforge and Modrinth.

commented

reopening, CF workflow is erroring..