Mighty monks, I'm trying to parse a rather large log file. With small quantities of data things run fine. When pointed at the entire 55mb, 366,000 line log file, the lights dim in the server room! Is this regex optimized? Runs fine on a little data, chokes on a lot. - Yes, I'm slurping in the WHOLE 55mb file at one time. The script is running on 4 CPU SPARC w/ 3+ gig of memory, so 55mb "shouldn't" be a real problem for it. - I'm slurping because a log file entry my span multiple lines. - What could I do better & smarter? Here's a sample script & data that parses 10 lines of the log file.
#!/usr/local/bin/perl use warnings; use strict; use English; use Data::Dumper; use Time::HiRes 'time'; my $logfile; my $dateRegex = qr/\d{4}-\d\d-\d\d \d\d:\d\d:\d\d:\d\d\d/; my $parse_log_entry = qr/^($dateRegex.*?)(?=$dateRegex)/ms; { # "slurp" the entire log file into memory at once local $INPUT_RECORD_SEPARATOR; $logfile = <DATA>; } my $count = 0; # Initialize counter my $start = time(); # Start the timer while( $logfile =~ m/$parse_log_entry/g ) { $count++; # count number of parsed strings } my $end = time(); # Stop the timer my $elapsed = $end - $start; # How long did that take? my $average = $elapsed/$count; # Average processing time printf "Parsed $count log file entries in %.4f seconds, averaging %.4f +\n", $elapsed, $average; exit; __DATA__ 2004-01-05 22:37:48:879 : xscWnpStation_WNP_N_1 : EXCEPTION : Resource +Limitation CLH0070E CANNOT ROUTE NPANXX OF 801-750-1742 (FAX) msg_id: + pOm94rLCG2CC39dj TRACKINID=200402043229577 : REQ_NO=62320402446422 : REQ_INSTANCE=0020 +: NNSP=6232 : ONSP=6875 : NLSP=6232 : OLSP=null : MSGTYPE=NOT 2004-01-05 22:38:52:019 : xscWnpStation_WNP_N_1 : dbError : SMGWNP0007 + Database related error. : in xscPortResponse_PRI: Cannot find a conf +irmed request for SUP3 msg_id: pOm94vILG2CCIthy TRACKINID=200402013135093 : REQ_NO=65290402734448 : REQ_INSTANCE=0002 +: NNSP=6529 : ONSP=6006 : NLSP=null : OLSP=6006 : MSGTYPE=PRI 2004-01-05 22:43:02:239 : xscWnpStation_WNP_N_1 : EXCEPTION : InputDat +aValidationError MPE2099E RESP NUM ON SUP MUST MATCH RESP NUM ON RESP +ONSE msg_id: 23f1bb:fa87b0edab:-4347 TRACKINID=200312312021455 : REQ_NO=62320312334661 : REQ_INSTANCE=0050 +: NNSP=6232 : ONSP=6529 : NLSP=6232 : OLSP=6529 : MSGTYPE=NOT 2004-02-05 22:37:48:879 : xscWnpStation_WNP_N_1 : EXCEPTION : Resource +Limitation CLH0070E CANNOT ROUTE NPANXX OF 801-750-1742 (FAX) msg_id: + pOm94rLCG2CC39dj TRACKINID=200402043229577 : REQ_NO=62320402446422 : REQ_INSTANCE=0020 +: NNSP=6232 : ONSP=6875 : NLSP=6232 : OLSP=null : MSGTYPE=NOT 2004-02-05 22:38:52:019 : xscWnpStation_WNP_N_1 : dbError : SMGWNP0007 + Database related error. : in xscPortResponse_PRI: Cannot find a conf +irmed request for SUP3 msg_id: pOm94vILG2CCIthy TRACKINID=200402013135093 : REQ_NO=65290402734448 : REQ_INSTANCE=0002 +: NNSP=6529 : ONSP=6006 : NLSP=null : OLSP=6006 : MSGTYPE=PRI 2004-02-05 22:43:02:239 : xscWnpStation_WNP_N_1 : EXCEPTION : InputDat +aValidationError MPE2099E RESP NUM ON SUP MUST MATCH RESP NUM ON RESP +ONSE msg_id: 23f1bb:fa87b0edab:-4347 TRACKINID=200312312021455 : REQ_NO=62320312334661 : REQ_INSTANCE=0050 +: NNSP=6232 : ONSP=6529 : NLSP=6232 : OLSP=6529 : MSGTYPE=NOT 2004-02-05 22:43:50:769 : xscWnpStation_WNP_N_1 : dbError : SMGWNP0007 + Database related error. : (Error Without Msg) In getRecordForSoaByTn +: tracking_id not found for TN: 7145246400; REQ_NO: 652904017079; OWN +ER: 6529 2004-02-05 22:44:51:979 : xscWnpStation_WNP_N_1 : EXCEPTION : InputDat +aValidationError MPE0600E DUE DATE/TIME MUST EQUAL DESIRED DUE DATE/T +IME TO CONFIRM REQUEST msg_id: pOm94t8TG2CDiqMP TRACKINID=200402053272708 : REQ_NO=621404024940585 : REQ_INSTANCE=0001 + : NNSP=6214 : ONSP=9740 : NLSP=null : OLSP=9740 : MSGTYPE=PRO 2004-02-05 22:47:12:879 : xscWnpStation_WNP_N_1 : dbError : SMGWNP0007 + Database related error. : (Error Without Msg) In getRecordForSoaByTn +: tracking_id not found for TN: 6193022949; REQ_NO: 652904027361; OWN +ER: 6529 2004-02-05 22:49:50:059 : xscWnpStation_WNP_N_1 : dbError : SMGWNP0007 + Database related error. : in xscDB_WnpPortResponse.insert: ORA-01400 +: cannot insert NULL into ("DBADMIN"."WNP_PORT_RESPONSE"."MESSAGE_TIM +ESTAMP") msg_id: pOm94nEIG2CEtKQd TRACKINID=200401092311869 : REQ_NO=65290401599145 : REQ_INSTANCE=0002 +: NNSP=6529 : ONSP=6664 : NLSP=null : OLSP=6664 : MSGTYPE=PR2 2004-02-05 22:49:50:079 : xscWnpStation_WNP_N_1 : messageError : SMGWN +P0009 Internal Message Error. : xscPortResponse_CLH_PR2 exception: xs +cDB_WnpPortResponse: caught SQL Exception during insertion: ORA-01400 +: cannot insert NULL into ("DBADMIN"."WNP_PORT_RESPONSE"."MESSAGE_TIM +ESTAMP") msg_id: pOm94nEIG2CEtKQd TRACKINID=200401092311869 : REQ_NO=65290401599145 : REQ_INSTANCE=0002 +: NNSP=6529 : ONSP=6664 : NLSP=null : OLSP=6664 : MSGTYPE=PR2

In reply to Pimp My RegEx by heathen

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.