I ended up removing the dates by generating regex from the strftime format to match them, using this:

sub timestamp2regex { my $exp = shift; my %metareplacements = ( 'D' => '%m/%d/%y', 'F' => '%Y-%m-%d', 'r' => '%I:%M:%S %p', 'R' => '%H:%M', 'T' => '%H:%M:%S' ); my %replacements = ( 'a' => '[[:alpha:]]+', 'A' => '[[:alpha:]]+', 'b' => '[[:alpha:]]+', 'B' => '[[:alpha:]]+', 'd' => '\d{2}', 'e' => '[\d\s]\d', 'g' => '\d{2}', 'G' => '\d{4}', 'h' => '[[:alpha:]]+', 'H' => '\d{2}', 'I' => '\d{2}', 'j' => '\d{3}', 'k' => '[\d\s]\d', 'l' => '[\d\s]\d', 'm' => '\d{2}', 'M' => '\d{2}', 'p' => '[A-Za-z.]{2,}', 'P' => '[A-Za-z.]{2,}', 's' => '\d+', 'S' => '\d{2}', 't' => '\t', 'u' => '\d', 'U' => '\d{2}', 'V' => '\d{2}', 'w' => '\d', 'W' => '\d{2}', 'y' => '\d{2}', 'Y' => '\d{4}', 'z' => '[+-]\d{4}', 'Z' => '[[:alpha:]]*', '%' => '\%' ); $exp = quotemeta($exp); $exp =~ s/\\\%\\?(.)/ if (defined $metareplacements{$1}) { timestamp2regex($metareplacements{$1}); } elsif (defined $replacements{$1}) { $replacements{$1}; } else { croak "Unsupported or unrecognized timestamp format token: + \%$1."; }/eg; return $exp; }

(This turned out to be much easier to write than I expected, after I gave up with locales.)

This isn't completely ideal, because it doesn't accept anything locale-related (it will croak on %c, %E, %O, %x, and %X), but I don't think those are actually going to be used. After writing this, I discovered Regexp::Common::time, which appears to be exactly what I was after (and somewhat what this code does), but it's much longer than my code, and I'm not sure if it handles certain things (like non-English AM/PM) as well as my code does. If I run into any locale issues with mine, though, I'll probably switch to that.


In reply to Re: Parsing arbitrarily-formatted timestamps out of log file entries by mr_flea
in thread Parsing arbitrarily-formatted timestamps out of log file entries by mr_flea

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.