I don't want File:Rsync or anything just wrapped around a rsync binary. librsync is simply dead code and no use for me. fsync is too ugly code to start with. I won't do 1:1 copies of directory trees but need different mappings from content to file name (and back) on both sides. The idea is to consider files with identical SHA1 checksums to be equal and not needed to be transferred. I know it's wrong, but "good enough".
What I want has at least some of rsync main parts:
1) Directory traversal (File::Find::Rule or my own code)
2) Algorithm similar to the rsync tech report
3) Transport protocol (rsync compatibility not required)
4) State storage on the server
5) Scalable server
6) Untrusted clients
Backups in the GB range with 1e5 files on each client should be doable without too much latency. Part 1 can be considered done. I have untested code for 2 that uses sum of digits as rolling checksum and md5/sha1 for the strong checksum.
I need suggestions for the transport protocol. An abandoned implementation used RPC::PlClient, but I have no clue whether XML RPC, SOAP or anything else is worth the additional effort. A compressed || encrypted channel with e. g. SSL, SSH would be a plus.
State storage means to me stat(), possibly ACL info and how to recover a client file. rsync uses the file system with stat(). The abandoned implementation of my backup application used a SQL database, but it was infeasible due to the load of first time clients. Imagine 1e5 inserts hitting the database with some latency. A new implementation would use a SQLite database per client that is "r"synced with the server. Bloom::Filter and some tricks on it will give fast and _accurate_ knowledge of files already on the server.
The server should be simple to setup and low on resource usage. Standalone servers are greatly preferred.
As anybody can hack a client the server needs to prevent harm and privacy issues to other clients by e. g. rechecking checksums before releasing files to the common pool.
Do you have helpfull suggestions or hints to (unfinished) code? Especially picking the right transport protocol is key to breaking up the superb but monolithic rsync.
Thank you very much.
In reply to rsync workalike by NiJo
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |