I am attempting to parse log files as efficiently as possible in Perl. In the following code snippet, I need to grab the first 18 fields, the next 40 characters, another 40 characters, and then the remaining fields in the string. The fields can be variable as you can see in the test data string.

Is there a faster way to do this in Perl? Is there a better regular expression to grab the first 18 fields?

Without a loss of speed, can I create a class that blesses the regex and has methods for returning the elements of the log file line? What is the fastest way to process log files without using, for instance, inline C? Any assistance will be greatly appreciated. Thanks.
#!/usr/local/bin/perl -w use strict; my $testdata=<<TESTDATA; -3 1 2 3 4 5 6657 7 8 9 10 11 12 13 14 15 16 20021013000000 NM 1 : + SR9550/1-SR9551/1 16S 1 12 LINE WEST + 0 0 -3 2 67 0 0 2 6657 2 1 0 0 0 0 4 131 0 0 20021013000000 Test021011 + 0 + 0 -3 3 67 0 0 2 6657 2 1 0 0 0 0 4 131 0 0 20021013000000 Test021011a + 0 + 0 -3 4 67 0 9 6 6657 2 1 0 0 0 0 6 131 0 0 20021013000000 NM 1 : + SR9550/1-SR9551/1 16S 1 18 LINE EAST 0 + 0 -3 5 67 0 0 2 6657 2 1 0 0 0 0 4 131 0 0 20021013001500 Test021011 + 0 + 0 -3 6 67 0 9 2 6657 2 1 0 0 0 0 6 131 0 0 20021013001500 NM 1 : + SR9550/1-SR9551/1 16S 1 12 LINE WEST 0 + 0 -3 7 67 0 0 2 6657 2 1 0 0 0 0 4 131 0 0 20021013001500 Test021011a + 0 + 0 -3 8 67 0 9 6 6657 2 1 0 0 0 0 6 131 0 0 20021013001500 NM 1 : + SR9550/1-SR9551/1 16S 1 18 LINE EAST 0 + 0 -3 9 67 0 0 2 6657 2 1 0 0 0 0 4 131 0 0 20021013003000 Test021011 + 0 + 0 -3 10 67 0 9 2 6657 2 1 0 0 0 0 6 131 0 0 20021013003000 NM 1 : + SR9550/1-SR9551/1 16S 1 12 LINE WEST +0 0 -3 11 67 0 0 2 6657 2 1 0 0 0 0 4 131 0 0 20021013003000 Test021011a + +0 0 -3 12 67 0 9 6 6657 2 1 0 0 0 0 6 131 0 0 20021013003000 NM 1 : + SR9550/1-SR9551/1 16S 1 18 LINE EAST +0 0 TESTDATA my @data; @data = split( '\n', $testdata ); my $line; my $str_18_fields; my $str_40_chars1; my $str_40_chars2; my $str_remain; my $regex = qr/^-((\S+\s+){18})(.{40})(.{40})(.+)/; foreach $line (@data) { if ($line =~ /^-3/) { $line =~ m/$regex/; $str_18_fields = $1; $str_40_chars1 = $3; $str_40_chars2 = $4; $str_remain = $5; $str_40_chars1 =~ s!\|!_!; $str_40_chars2 =~ s!\|!_!; print "\$str_18_fields = $str_18_fields\n"; print "\$str_40_chars1 = $str_40_chars1\n"; print "\$str_40_chars2 = $str_40_chars2\n"; print "\$str_remain = $str_remain\n\n"; } } # end foreach

In reply to Efficient Log File Parsing with Regular Expressions by hackdaddy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.