Monks,

My prog needs to load a set of files from a dir.
The filename format is aaa_bbb_ttt.ddd.eee, where ttt is a timestamp of file creation in epoch seconds.
The prog will receive 2 input params, start_datetime, end_datetime, which I'll cvt to epoch secs to match aginst ttt above.
Ideally, I'd like a way of efficiently extracting the subset I need.

Note that there are 2 constraints:
1. some timestamps may not be represented (ie no files with that value)
2. it is likely that many files will exist with the same timestamp(s).

I'm going to take snapshot list of files when I start, as the dir will still be being written to, but the end_datetime will be a fixed value, less than 'now'.
I'm sure it's possible in theory, via some combo of map/split/grep/sort/hash etc, to extract the middle part of the list ie files that I need, but I'm not sure that the overall processing time will be any quicker than just working through my snapshot list sequentially.
Any file with a datetime in the desired range will be read and the contents inserted into a DB (Ingres).
The num of files in the dir will be in the order 1k - 10k approx.
I was thinking of amending something like this:

@sorted = sort # default sort numeric map { $_->[2] } # grab 3rd field (timestamp) of ar +ray (ref) map { [ split(/_/,$_) ] } # split fnames on '_', rtn array r +ef grep { !/^\./ } # filter out dot files readdir(EVT_DIR); # read all entries
except I don't need the sort (not reqd), but I'd need replace that line with code to say only timestamp values in the desired range.

Cheers
Chris
PS Also need to ignore any dirs that exist in the target dir


In reply to Extract the middle part of a list by chrism01

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.