I'm not in favour of an SQL database like SQLite, because of the clumsiness of working with it. Especially the lack of native, atomic update-or-insert operation is holding me back. I dislike this so much, that this is the reason I've postponed actually implementing it for months. I want something simpler. Well, if you know of an extra (fast) layer that would make it easier to work with, then I'm still all ears...
Actually, I like how Perl hashes work: insert-or-update just comes with the territory.
The main problem is updating: I want quick and fast incremental updates, not a complete rescan every time, because that could easily take 20 minutes every time.
So, what are alternatives, for Perl? It should be indexed by path, which is, again, a bit impractical for SQL databases. As for updating, adding new files to the database should go fast, which means a quick elimination of already existing entries, as well as detecting changes in the existing files (change of modification time and/or size). Ideally, I should be able to detect renames and moves per file, by detecting a file has gone missing and another one with the same checksum has popped up. Hmm... I can do that.
So... What would you do? BerkeleyDB? Anything else?
Update: Actually, an SQL database has another selling point: it's easy to search for (candidate) duplicates by just using a query on MD5 checksum and file size. A feature I'd also like to see in any approach.
In reply to What kind of database should I use? by bart
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |