Keywords:
SHA1,databases,Higher order perl,hashing,efficiency
Hello,
I have thought this a couple of time and now I can formulate what first
was just a small feeling
We are given the following:a database with a table with 3 columns=> (name_of_file,sha1_of_file,last_modified_time)
This database is built once using all the files on the disk and it takes some time.
Periodically one needs to update the database as some of the files might be modified,
but this is also a time consuming process,yes,it is less time consuming than building the
whole database from scratch again,but it is nonetheless time consuming.
So for each file it is necesary to get the file,compare its last_modified_time with the one in the
database and if they differ,update its checksum in the database(because we know it has been
modified).
However I felt that for small files a "strange" phenomena might take place
As they are small(we do not yet define small - we denote this unknown by
(1)), the time it will take for it to
be hashed will also be small(we also do not have a clear definition,function,anything that
describes how the carachteristics of the file influence the hashing speed - we denote this unknown by
(2)).
So the question is :
Is the time for hashing small files smaller than fetching them from database?
If this question is answered then for small enough files we can decide
to not search for them in the database and hash them directly without any other reasoning
(this is somewhat similar to the efficiency of the hashes described in Higher Order Perl),
and this would be a nice optimisation,I cannot measure if it would be big or small but
considering the disk has allot of files(order of magnitude 10^6)
We need to know in what conditions does this happen but we are unable to do so
because of
(1),(2).
Any ideas or suggestions are very much appreciated.
Thank you
Note:This is related to a previous
node
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.