I have a meta-question on what kind of database I should use, in your opinion, to implement a meta-database about a directory tree, storing extra data per file, like author, title, MD5 or SHA digest...

I'm not in favour of an SQL database like SQLite, because of the clumsiness of working with it. Especially the lack of native, atomic update-or-insert operation is holding me back. I dislike this so much, that this is the reason I've postponed actually implementing it for months. I want something simpler. Well, if you know of an extra (fast) layer that would make it easier to work with, then I'm still all ears...

Actually, I like how Perl hashes work: insert-or-update just comes with the territory.

The main problem is updating: I want quick and fast incremental updates, not a complete rescan every time, because that could easily take 20 minutes every time.

So, what are alternatives, for Perl? It should be indexed by path, which is, again, a bit impractical for SQL databases. As for updating, adding new files to the database should go fast, which means a quick elimination of already existing entries, as well as detecting changes in the existing files (change of modification time and/or size). Ideally, I should be able to detect renames and moves per file, by detecting a file has gone missing and another one with the same checksum has popped up. Hmm... I can do that.

So... What would you do? BerkeleyDB? Anything else?

Update: Actually, an SQL database has another selling point: it's easy to search for (candidate) duplicates by just using a query on MD5 checksum and file size. A feature I'd also like to see in any approach.


In reply to What kind of database should I use? by bart

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.