MarcAllan has asked for the wisdom of the Perl Monks concerning the following question:

Monks I have a regex question I would like to ask, probably a simple answer for you gurus out there. I have an hash value returned which can be one of two strings.

Value A CDATA[Starting SITUATION1 <311446961,2559575967> for K01.K01CSYSTE5. and Value B CDATA[Starting Enterprise situation SITUATION1 <311439572,1066403668> +for K01.K01CSYSTE5.

Now, I need to match certain parts of the string and pass them off as parameters, so for example, if value A is returned, I need to match the following values from the pattern below

CDATA[Starting SITUATION1 <311446961,2559575967> for K01.K01CSYSTE5. Starting (Param 1) SITUATION1 (Param 2) K01.K01CSYSTE5 (Param 3)

So I coded the following which works for value A

if ($elementHash->{MSGTEXT} =~ /CDATA\[(\S+)\s+(\S+)\s+.+(on|for)\s+(\ +S+)\./) { $status = $1 (Which as the value "Starting") $sitname = $2 (Which as the value "SITUATION1") $type = $4 (Which as the value "K01.K01CSYSTE5") }

But I need to alter the regex pattern so I would also match param $1, $2 and $4 for value B

CDATA[Starting Enterprise situation SITUATION1 <311439572,1066403668> for K01.K01CSYSTE5.

Notice the extra bit of text "Enterprise situation" being return in value B.

So what I need is a universal pattern which will match

Starting as $1 SITUATION1 as $2 K01.K01CSYSTE5 as $4

for the possibility of either VALUE A or VALUE B string being returned Hope this makes sense

Replies are listed 'Best First'.
Re: RegEx pattern
by cdarke (Prior) on Aug 30, 2011 at 09:00 UTC
    Why not just make the 'Enterprise situation' text optional?
    if ($elementHash->{MSGTEXT} =~ /CDATA\[(\S+)\s+(?:Enterprise situation +\s+)?(\S+)\s+.+(on|for)\s+(\S+)\./)
    Notice that the group (?:Enterprise situation\s+) is a non-capturing parentheses group, which means that $1,$2,$4 are unaffected by the new group. You might also do the same with (?:on|for) which would mean that $4 becomes $3. This avoids the overhead of capturing text you don't need and, more importantly, means you can concentrate on the groups you are actually interested in.

      Exactly what I needed, thank you

Re: RegEx pattern
by duyet (Friar) on Aug 30, 2011 at 08:00 UTC
    You can also write a small program to do the job:
    my $elementHash = { MSGTEXT1 => 'CDATA[Starting SITUATION1 <311446961, +2559575967> for K01.K01CSYSTE5', MSGTEXT2 => 'CDATA[Starting Enterprise situation S +ITUATION1 <311439572,1066403668> for K01.K01CSYSTE5.', }; foreach my $key ( keys %{ $elementHash } ) { my @tmp = parse_str( $elementHash->{$key} ); print Dumper( \@tmp ); } exit; sub parse_str { my $str = shift; my @items = split ' ', $str; my @tmp = split '\[', $items[0]; my $size = $#items; my $param2 = ( $size > 4 ) ? $items[3] : $items[1]; ( $tmp[1], $param2, $items[ $size ] ); }
    Dumped data:
    $VAR1 = [ 'Starting', 'SITUATION1', 'K01.K01CSYSTE5' ]; $VAR1 = [ 'Starting', 'SITUATION1', 'K01.K01CSYSTE5.' ];