Hello Monks, I am working on a script which parses a set of log files. The goal is to extract the values of certain tags in the log files with capture groups. The data I am parsing looks similar to this:

2009/01/15 01:23:45:678: ASDF: [8=FIX.4.4^A9=228^A35=D^A49=ZYXW^A56=MY +CO^A34=6^A52=20090115-01:23:45^A116=BLAH^A129=HALB^A50=MEH^A1=HEM^A11 +=abcefg123456^A15=ZZZ^A21=1^A22=5^A38=100^A40=2^A44=4.80000000^A48=ZV +ZZT.N^A54=2^A55=ZVZZT^A59=0^A60=20090115-01:23:45^A100=MEH^A10=111^A]
Note that ^A represents the SOH character (Ascii val 1).

My goal is to be able to capture the value of any given tag. So far, I have tried this:

if($line =~ m/^A55=(.*?^A)/){ print "$1|"; } else { print "|"; }
My output is a pipe delimited set of values. The above PERL works with the exception that each value in my output contains the "^A". I want to correct his by capturing just the value between "\d\d\d\d=" and the next "^A" (ungreedy) where every "d" is known. If no match is found (i.e. this tag is not present) I want to output just a "blank" pipe.

The second issue I would like to resolve is to simplify or generalize my statements. Currently I have a series of if statements, such as the one above, checking for every tag I'm trying to capture (e.g. 55=, 48=, 22=). I'd like to see if there is a more "clever" way to do this in a single statement. Something such as this, perhaps:

if($line =~ m/(^A22=.*?^A).*(^A40=.*?^A).*(^A48=.*?^A).*(^A54=.*?^A).* +(^A55=.*?^A)/g){ print "$1|$2|$3|$4|$5\n"; }
Please note that if one of the patterns above does not match, I'd like the corresponding $buffer variable to contain a blank rather than the next matched group value.

Thanks very much for your time and consideration,

_j


In reply to Log Parsing using Regex by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.