For my backup project I'm looking for code and suggestions to a rsync workalike that is properly modular and pure perl.

I don't want File:Rsync or anything just wrapped around a rsync binary. librsync is simply dead code and no use for me. fsync is too ugly code to start with. I won't do 1:1 copies of directory trees but need different mappings from content to file name (and back) on both sides. The idea is to consider files with identical SHA1 checksums to be equal and not needed to be transferred. I know it's wrong, but "good enough".

What I want has at least some of rsync main parts:
1) Directory traversal (File::Find::Rule or my own code)
2) Algorithm similar to the rsync tech report
3) Transport protocol (rsync compatibility not required)
4) State storage on the server
5) Scalable server
6) Untrusted clients

Backups in the GB range with 1e5 files on each client should be doable without too much latency. Part 1 can be considered done. I have untested code for 2 that uses sum of digits as rolling checksum and md5/sha1 for the strong checksum.

I need suggestions for the transport protocol. An abandoned implementation used RPC::PlClient, but I have no clue whether XML RPC, SOAP or anything else is worth the additional effort. A compressed || encrypted channel with e. g. SSL, SSH would be a plus.

State storage means to me stat(), possibly ACL info and how to recover a client file. rsync uses the file system with stat(). The abandoned implementation of my backup application used a SQL database, but it was infeasible due to the load of first time clients. Imagine 1e5 inserts hitting the database with some latency. A new implementation would use a SQLite database per client that is "r"synced with the server. Bloom::Filter and some tricks on it will give fast and _accurate_ knowledge of files already on the server.

The server should be simple to setup and low on resource usage. Standalone servers are greatly preferred.

As anybody can hack a client the server needs to prevent harm and privacy issues to other clients by e. g. rechecking checksums before releasing files to the common pool.

Do you have helpfull suggestions or hints to (unfinished) code? Especially picking the right transport protocol is key to breaking up the superb but monolithic rsync.

Thank you very much.


In reply to rsync workalike by NiJo

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.