All:
I have already come up with two solutions to my problem. So
this is more a question of which one to implement based off how
perl works internally. You can readmore for a verbose description:
I processing a very transient directory in an infinite loop. Basically, I read
the first 64K of a file to look for specific information. If the information is
present, the file gets moved out of the directory. If not, it is eventually moved by
another process that I am in a race condition with. In between each cycle, I go to sleep
for a period of time as to not chew up too much CPU. About once a minute, I go update my
list of criteria for processing as it changes over time. What I would like to do is cache
the file names I have already processed, so that I do not process the same file twice. The
file names eventually get re-used, but if I invalidate my cache when I update my list, I can be
assured that there are no issues. My idea is as follows:
Move on to the next file if it is in my cache
If not, check to see if it meets my criteria
If yes, move it off the directory and do nothing with cache
If no, add the file to my cache and move on to the next file
Once a minute, clear my cache
I could either push the filename to an array, or create a hash key
next if (exists $cache{$_});
or
next if (grep /\b$_\b/ , @cache);
and later on .....
push @cache , $_;
or
$cache{$_}++;
What are the dynamics of each approach? Internally does
%cache = ();
or
@cache = ();
and then re-creation have any impact as far as memory allocation/speed? Is there a point
at which having more files in the cache give one approach a speed increase
over the other? Is there a rule of thumb like if under 100 items, use A?
Basically, I am asking how does each process work internally so that I can decide
which method to implement based off my dynamic environment. I can't really benchmark
without live data. I can siphon off live data for file variation, but I can't replay it
at the same speed as it happens in production so I never know how deep the directory will be.
Cheers - L~R
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.