The appropriate solution to this problem depends on how precise the pattern matching needs to be. How much post-extraction processing you are willing to do matters as well, e.g. do you need '58bn5904' or are you content with 'd:\data\58bn5904.dat'.

To give you an idea of how ugly the regex could become:

use strict; use warnings; # Example line: # e:\logfiles\beardstownbase.log [3] Thu 22Jun06 08:07:19 - (006415) S +ent file d:\data\58bn5904.dat successfully (25.0 Kb/sec - 859216 byte +s) # Desired: # beardstownbase,Thu 22Jun06 08:07:19,58bn5904,859216 bytes my $re_date = qr< (?:Sun|Mon|Tue|Wed|Thu|Fri|Sat) \s \d{1,2} # Day of month (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) # Month \d{2} # Two digit year \s \d{2}:\d{2}:\d{2} >x; my $pattern = qr< e:\\logfiles\\(.*?) # Capture(1) filename \s \[\d+\] # Bracketed number \s ($re_date) # Capture(2) date \s - \s \(\d+\) # number in parens \s Sent \s file \s d:\\data\\(.*?)\.dat # Capture(3) file basename \s successfully \s \( [0-9.]+ \s [A-Z]b /sec [ ] - [ ] (\d+ \s bytes) # Capture(4) bytes text \) >x; while (my $line = <DATA>) { if ($line =~ /$pattern/) { my ($logfile, $date, $file_basename, $bytes) = ($1,$2,$3,$4); printf "(%s) (%s) (%s) (%s)\n", $logfile,$date,$file_basename, + $bytes; } } __DATA__ e:\logfiles\beardstownbase.log [3] Thu 22Jun06 08:07:19 - (006415) Sen +t file d:\data\58bn5904.dat successfully (25.0 Kb/sec - 859216 bytes)
I have been meaning to learn Parse::RecDescent for ages, so tonight I took some time to try and solve your problem with it. It is likely the wrong tool for this job, and definitely a poor implementation - I would welcome any feedback for people with stronger parse-fu.
use strict; use warnings; use Parse::RecDescent; $::RD_HINT=5; my $grammar = <<'GRAMMAR'; { use strict; use warnings; } logfile : 'e:\\logfiles\\' /[-A-Za-z0-9_.]+/ { $item[2] } date : m{ (?:Mon|Tue|Wed|Thu|Fri|Sat|Sun) \s \d\d (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d\d }x time : /\d{2}:\d{2}:\d{2}/ sentfile: <skip:''> 'd:\\data\\' /[-A-Za-z0-9_]+/ '.dat' { $item[3] } rate : /\d+\.\d [A-Za-z]+\/sec/ bytecount : /\d+ bytes/ parse : logfile /\[\d+\]/ date time /- \(\d+\) Sent file / sentfile <skip:'[- \t()]*'> ( /successfully/ rate ) bytecount { [ @item{qw(logfile date time sentfile bytecount)}] } GRAMMAR # Expect: beardstownbase,Thu 22Jun06 08:07:19,58bn5904,859216 bytes my $parser = Parse::RecDescent->new($grammar); use Data::Dumper; while (my $line = <DATA>) { last unless $line =~ /\S/; my @fields = $parser->parse($line); if (@fields) { print Dumper \@fields; } } __DATA__ e:\logfiles\beardstownbase.log [3] Thu 22Jun06 08:07:19 - (006415) Sen +t file d:\data\58bn5904.dat successfully (25.0 Kb/se
Output:
$VAR1 = [ [ 'beardstownbase.log', 'Thu 22Jun06', '08:07:19', '58bn5904', '859216 bytes' ] ];

In reply to Re: Regex Help - Large regex example, and larger Parse::RecDescent attempt by imp
in thread Regex Help pulling Data from a string by batcater98

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.