MelaOS has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys n gals,

I'm a total newb to perl and regex and is having a hard time with the work that i'm currently doing.

i have a input file with stuff like below where i have to extract out the test name and the value in the middle along with it's unit.

So for the example shown below, i would need to extract out abc, 450.98mV, bcd 50.70dB and are 50.00dB.

I have been racking my brain trying to find a way to differentiate between those with just two units but have no idea how to make it so i can pull just the middle value.

Anyone has any suggestions, regex or whatever concept that can do this?

605 abc xxx 410.00 mV < 450.98 mV < 490.00 mV 606 bcd yyy -46.50 dB < 50.70 dB 607 are zzz 50.00 dB < 58.48 dB

Replies are listed 'Best First'.
Re: How to regex this one out?
by albert (Monk) on Jun 15, 2006 at 10:03 UTC
    For something like this, I'd typically split on white space and pick the values out rather than creating a regex. Assuming there is always white space between all the items, the following would work with your example data.
    while (<DATA>) { # Remove white space at end of line; s/\s+$//g; # Get length of line my $length = length; my @items = split; # If line is long, want the internal items, # if shorter want the last ones # 50 works with the example. if ( $length > 50 ) { print join( "\t", @items[ 1, $#items - 4, $#items - 3 ] ), "\n +"; } else { print join( "\t", @items[ 1, $#items - 1, $#items ] ), "\n"; } } __DATA__ 605 abc xxx 410.00 mV < 450.98 mV < 490.00 mV 606 bcd yyy -46.50 dB < 50.70 dB 607 are zzz 50.00 dB < 58.48 dB
    Of course, you can capture the values of interest rather than printing them.

    -albert

Re: How to regex this one out?
by perlsen (Chaplain) on Jun 15, 2006 at 10:01 UTC
    Hi, just try this

    Updated:
    while(<DATA>) { =head if ($_=~ m#^(\d+)\s+(\w{3}).*?\<\s+((.*?)(mV|db)).*?#gsi) { print "$2 \t $3\n"; } =cut if ($_=~ m#^(\d+)\s+(\w{3})(.{24}).*?(\d+\.\d+\s+\w+).*?#gsi) { #print "$&\n"; print "$2 \t $4\n"; } } #output #abc 450.98 mV #bcd 50.70 dB #are 50.00 dB __DATA__ 605 abc xxx 410.00 mV < 450.98 mV < 490.00 mV 606 bcd yyy -46.50 dB < 50.70 dB 607 are zzz 50.00 dB < 58.48 dB
    Regards
    perlsen
      This regex pulls out the last value, not the middle one, which the OP was looking to capture.

      Update: regex of previous post fixed.

      -a

Re: How to regex this one out?
by Moron (Curate) on Jun 15, 2006 at 13:05 UTC
    Could make the pattern generic enough to split the whole line into stripped list elements before picking the desired ones by position:
    my @fields = split( /\s*\<\s*/ ) and print join (' ', $fields[1], $fields[5], $fields[6]) . "\n" for <DATA>;

    -M

    Free your mind