Advanced Backups

Advanced Backups

1M Downloads

Issues with backup preparation times

HeatherComputer opened this issue ยท 4 comments

commented

Okay, so here's the deal :

  • In order to calculate which files to backup in an incremental / differential partial, we need to make a hash of each file.

  • This is... slow, because it effectively means reading the entire world from disk as if we were making a full backup, before we even start making one. We then have to re-read the files we want to backup from disk when the backup actually starts.

  • On larger worlds, this can lead to a long time sat on the Backup Starting message before any progress updates are sent, which can make a backup look stalled even when it wasn't.

  • It also extends the backup time pretty significantly, of course, because you're reading the entire world just to figure out what to backup.

Problem is... how can one solve this?

  • We kinda have to use hashes, because minecraft does not properly update file modification dates. We cannot use dates to tell if a file has changed. See #33 for more info.
  • We could remove the apparent stall by backing up a file as soon as we know how to back it up or not. However, this introduces another issue - the "smart chain reset" feature wouldn't work here - because we'd have finished a backup before we'd know if the chain should be reset or not.
    • This wouldn't actually speed up backups at all, but it'd remove the apparent stall.
  • We could, in theory, only hash parts of the file. This however has problems because we risk just skipping over the only part of a file that has changed.. thus not backing up a file that we should backup.
    • This wouldn't outright remove the apparent stall. However, it would significantly speed this stage up, and thus speed up backups as a whole.
commented

Hide the stall.

Keep track of the time to hash the last few times take an average, and use that to display a progress while hashing. ๐Ÿ˜Ž

commented

We kinda have to use hashes, because minecraft does not properly update file modification dates.

How about, we combine the two methods! If file modification date didn't change, check the file hash, if it did change, it means the file was for sure modified, so we can skip calculating hash and just back it up ๐Ÿ˜„

commented

We kinda have to use hashes, because minecraft does not properly update file modification dates.

For MCA files, which I'd assume are the bulk of the processing load, we could probably use the chunk update timestamps in the header.
If that for some reason doesn't work we might be able to use the LastUpdate values of individual chunks. NBT can only be parsed linearly, but as soon as one LastUpdate has changed we can skip the rest of the file, and thanks to the MCA header each chunk could be parsed in parallel.

However, I'd strongly recommend adding a config option to disable 'smartness' like this, just in case a server admin is aware of anything that might screw with these indicators...

Keep track of the time to hash the last few times take an average, and use that to display a progress while hashing. ๐Ÿ˜Ž

As far as I can tell we wouldn't even need to estimate, since we know (1) how many files need to be hashed, (2) how big they are and (3) how many bytes of the current file have been processed so far.

commented

!target 4.0

The planned reworks in the spi system and filesystem watchers should deal with these.