Re: Curve fitting for rsync snapshot deletion

Using rsync for backups is kinda outdated. You should probably use a versionning tool. If you never used a VCS, learning one won’t be a waste of time—it’s an increasingly important skill for programmers.

I think that git is the best fit for your particular use case: each commit is a full snapshot of your working directory (about every other VCS stores commits as a chained list, making it impossible to delete a particular commit). It also uses a compression scheme optimized for storing successive snapshots, making it more space-efficient than rsync.

With that said, if you’re dead set on using rsync, the parabolic function you suggest can be implemented like this (I’m only demonstrating the algorithm here, not the rsync stuff):

#!perl -w
use v5.16;
use List::MoreUtils qw(uniq);

my @snapshots;
my $capacity = 100.5;
my $ratio = .70;
my @keep_me = reverse uniq map { $capacity - int $_**2 / $ratio**2 / $
+capacity } 1..$ratio * $capacity;

for (1..1000) {
    push @snapshots, $_;
    if (@snapshots > $capacity) {
        @snapshots = @snapshots[@keep_me];
    }
}

say "@snapshots";
[download]

This example assumes that you do a total of 1000 snapshot, but only have enough disk space to store 100. It lists the snapshots that are kept: 760 803 825 834 (…) 998 999 1000. Rather that blindly keeping the 100 latest snapshots (901..1000), it keeps snapshots from much further back, getting increasingly sparse the further back in time you go. You could also try other functions than y=x^2; I’d suggest an exponential.

Comment on Re: Curve fitting for rsync snapshot deletion Download Code

Replies are listed 'Best First'.
Re^2: Curve fitting for rsync snapshot deletion by Corion (Patriarch) on Nov 22, 2013 at 12:09 UTC
Note that `git` really wants to use memory mapped files when committing/restoring files. This means `git` has problems with large files, at least on 32-bit systems. For example, I could not store video files with a size of 200MB or so in a `git` repository. Also, `git` cannot purge older backups or create "holes" in the history. You cannot age out old or intermediate backups. `git` wants to keep the full history. Other than that, `git` has at least the user interface part of storing and restoring things done.	[reply] [d/l] [select]
Re^3: Curve fitting for rsync snapshot deletion by mhearse (Chaplain) on Nov 25, 2013 at 15:38 UTC
Thanks for your post. This is definitely applicable to me. As most of my machines are 32 bit clunkers... barring my sparc64 boxes... but they have only 256MB of memory!	[reply]
Re^2: Curve fitting for rsync snapshot deletion by mhearse (Chaplain) on Nov 21, 2013 at 21:15 UTC
Thanks for your post. I agree about git. I guess I could check in new files... and revise existing ones as modifications are made. And best of all I could search them insanely fast via: `git grep`	[reply] [d/l]


more useful options
	PerlMonks