Thanks for all the advise, I have gone the DB route - although the storable option was I guess the 'other way' that I was hazily thinking about, I do believe I've used it long in the past, but once I had a bit of a nudge in the DB direction the simplicity and ease for future tweaks won me over. With this type of thing that you're often modifying while you use it, the DB does make it easy to change on a whim, so although I don't quite know what challenges might arise as I work with the data, what I might realise 'ah forgot I might want to do that', if I can avoid calculating all those MD5s (or SHA-256s - I will probably change to that) again, and just update etc in the easiest way, its worth paying a price of slightly reduced performance if there is one. And I didn't know (/ had forgotten possibly, because I've used DBD a lot in the past, but maybe it was only with external DBs) how simple the set up of DBD::SQLite was even on an over stressed laptop.

I just took a 12 hour plane trip and coded most of the project during that. Kill two birds with one stone, get a job done and find a way to make a plane journey go a bit faster - I find getting into a bit of code makes time fly. Didn't go the SHA-256 route yet because I didn't have the module installed but will install it for the return trip and hope I have enough work left in the job to keep be occupied for the ride back, because collisions are a concern, even if I can handle them with a last resort diff, they will slow things down because of the low bandwidth between servers.

Because access between servers is not consistent, I am going to run the code locally to each server without need for a network, and then once finished or updated transfer the DB files to the processing machine. Some of the machines are quite slow atom types so its best to nice the process and let them do it in their own time, no need for up to the minute results. Then if theres work to do like deleting, making local links (for local duplicates), whatever else I don't know yet, I'll either do that live from the central machine processing script or automatically create local processing scripts. Really though at the moment I am thinking of just consolidating all this data into a single file system that can be kept organised from this point on automatically - ideally through ZFS although I don't know whether it will play nicely with the reliability and speed of the links, I have only used it on single machines so far.

Anyways just wanted to say thanks for the help and ideas.

Best, Pete

In reply to Re: Alternatives to DB for comparable lists by peterrowse
in thread Alternatives to DB for comparable lists by peterrowse

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.