Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re: Curve fitting for rsync snapshot deletion

by Grimy (Pilgrim)
on Nov 21, 2013 at 19:21 UTC ( #1063788=note: print w/replies, xml ) Need Help??

in reply to Curve fitting for rsync snapshot deletion

Using rsync for backups is kinda outdated. You should probably use a versionning tool. If you never used a VCS, learning one wonít be a waste of timeóitís an increasingly important skill for programmers.

I think that git is the best fit for your particular use case: each commit is a full snapshot of your working directory (about every other VCS stores commits as a chained list, making it impossible to delete a particular commit). It also uses a compression scheme optimized for storing successive snapshots, making it more space-efficient than rsync.

With that said, if youíre dead set on using rsync, the parabolic function you suggest can be implemented like this (Iím only demonstrating the algorithm here, not the rsync stuff):

#!perl -w use v5.16; use List::MoreUtils qw(uniq); my @snapshots; my $capacity = 100.5; my $ratio = .70; my @keep_me = reverse uniq map { $capacity - int $_**2 / $ratio**2 / $ +capacity } 1..$ratio * $capacity; for (1..1000) { push @snapshots, $_; if (@snapshots > $capacity) { @snapshots = @snapshots[@keep_me]; } } say "@snapshots";
This example assumes that you do a total of 1000 snapshot, but only have enough disk space to store 100. It lists the snapshots that are kept: 760 803 825 834 (Ö) 998 999 1000. Rather that blindly keeping the 100 latest snapshots (901..1000), it keeps snapshots from much further back, getting increasingly sparse the further back in time you go. You could also try other functions than y=x^2; Iíd suggest an exponential.

Replies are listed 'Best First'.
Re^2: Curve fitting for rsync snapshot deletion
by Corion (Patriarch) on Nov 22, 2013 at 12:09 UTC

    Note that git really wants to use memory mapped files when committing/restoring files. This means git has problems with large files, at least on 32-bit systems. For example, I could not store video files with a size of 200MB or so in a git repository.

    Also, git cannot purge older backups or create "holes" in the history. You cannot age out old or intermediate backups. git wants to keep the full history.

    Other than that, git has at least the user interface part of storing and restoring things done.

      Thanks for your post. This is definitely applicable to me. As most of my machines are 32 bit clunkers... barring my sparc64 boxes... but they have only 256MB of memory!
Re^2: Curve fitting for rsync snapshot deletion
by mhearse (Chaplain) on Nov 21, 2013 at 21:15 UTC
    Thanks for your post. I agree about git. I guess I could check in new files... and revise existing ones as modifications are made. And best of all I could search them insanely fast via: git grep

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1063788]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2023-10-04 22:36 GMT
Find Nodes?
    Voting Booth?

    No recent polls found