comment on

Hello Monks, I am working on a script which parses a set of log files. The goal is to extract the values of certain tags in the log files with capture groups. The data I am parsing looks similar to this:

2009/01/15 01:23:45:678: ASDF: [8=FIX.4.4^A9=228^A35=D^A49=ZYXW^A56=MY
+CO^A34=6^A52=20090115-01:23:45^A116=BLAH^A129=HALB^A50=MEH^A1=HEM^A11
+=abcefg123456^A15=ZZZ^A21=1^A22=5^A38=100^A40=2^A44=4.80000000^A48=ZV
+ZZT.N^A54=2^A55=ZVZZT^A59=0^A60=20090115-01:23:45^A100=MEH^A10=111^A]
[download]

Note that ^A represents the SOH character (Ascii val 1).

My goal is to be able to capture the value of any given tag. So far, I have tried this:

if($line =~ m/^A55=(.*?^A)/){
    print "$1|";
} else {
    print "|";
}
[download]

My output is a pipe delimited set of values. The above PERL works with the exception that each value in my output contains the "^A". I want to correct his by capturing just the value between "\d\d\d\d=" and the next "^A" (ungreedy) where every "d" is known. If no match is found (i.e. this tag is not present) I want to output just a "blank" pipe.

The second issue I would like to resolve is to simplify or generalize my statements. Currently I have a series of if statements, such as the one above, checking for every tag I'm trying to capture (e.g. 55=, 48=, 22=). I'd like to see if there is a more "clever" way to do this in a single statement. Something such as this, perhaps:

if($line =~ m/(^A22=.*?^A).*(^A40=.*?^A).*(^A48=.*?^A).*(^A54=.*?^A).*
+(^A55=.*?^A)/g){
    print "$1|$2|$3|$4|$5\n";
}
[download]

Please note that if one of the patterns above does not match, I'd like the corresponding $buffer variable to contain a blank rather than the next matched group value.

Thanks very much for your time and consideration,

In reply to Log Parsing using Regex by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.