apocalyptica has asked for the wisdom of the Perl Monks concerning the following question:

Almighty perl monks,

Please forgive me for my ignorance, oh revered ones. I am a mere techsupport flunky who was kind of thrown into a new position that involves a lot of Perl. ("You know PHP, right? You can figure out Perl!" said my boss. What a cruel man.) And now, I beseech you, please offer your guidance.

I have a text file with the following line in it:

| LEVENE'S TEST FOR VARIANCES        3, 561               0.72            0.5422    |

This is an example -- the LEVINE'S TEST FOR VARIANCES comes up multiple times throughout the file. I need to suck out the column that has 0.5422 in it each time. I tried the following:

\$levene = \$4 if /TEST FOR VARIANCES\\s+(${fl})\\s+(${fl})\\s+(${fl}) +\\s+(${fl})/; ($fl = '-?\d+\.\d+')


This is obviously not working, and I am sure it is way, way off. I have poured over a couple of Perl books, but both are very vague with regular expressions. Can anyone please help a poor guy out? Thanks ever so much.

Edit by tye, replace PRE with CODE

Replies are listed 'Best First'.
Re: regular expressions question
by davido (Cardinal) on May 03, 2004 at 15:47 UTC
    That looks a LOT like fixed-width column data, and if that's the case, you might just find it easier to use unpack. The description of unpack's template system is found in the documentation for pack. And you can find a wealth of great information in perlpacktut.

    If you really want a regexp solution, this might just do it:

    my @last_columns; while ( my $line = <DATA> ) { if ( $line =~ /TEST FOR VARIANCES/ ) { if ( $line =~ /([.\d]+)[\s|]*$/ ) { $last_column = $1; push @last_columns, $1; } } } print "$_\n" for @last_columns;


    Dave

Re: regular expressions question
by sgifford (Prior) on May 03, 2004 at 15:53 UTC
    You have the right idea, but you made a few somewhat obvious mistakes. To debug an RE, you need to go through it carefully to see why it's not working, and break it down into smaller bits until you can see what's going on. Here are the problems I found:
    • You put an extra backslash before \s, so the RE was looking for a literal backslash followed by the letter s.
    • Your number pattern in $f1 doesn't take the second column into account, which has no decimal point and a comma.

    This version seems to do what you want:

    $fl = '-?\d+\.?\d*,?'; $levene = $4 if /TEST FOR VARIANCES\s+(${fl})\s+(${fl})\s+(${fl})\s+(${fl})/;

    It looks like your data might line up in the same character positions every time, in which case using unpack or substr might be an easier option.

    Good luck!

Re: regular expressions question
by matija (Priest) on May 03, 2004 at 15:57 UTC
    Well, for one thing, the first number in that line does not conform to your $fl definition (numbers, dot, numbers), because it is "digit, comma, blank, digit digit digit".

    Another thing: Why are you assigning with \$levene = \$4? You're taking a reference to the fourth match? I find that deeply suspect, and likely to be a cause of all sorts of funny trouble. (Were you perhaps a C programmer in a previous life? Because that smells of the C idiom that is not needed in Perl). Try losing those backslashes (unless you really, really know what they're there for).

Re: regular expressions question
by Abigail-II (Bishop) on May 03, 2004 at 15:43 UTC
    This is an example -- the LEVINE'S TEST FOR VARIANCES comes up multiple times throughout the file. I need to suck out the column that has 0.5422 in it each time.
    If the column has "0.5422" in it each time, there's no need to "suck it out", as you already know what's there.

    Abigail

      My interpretation (hardly a stretch at all):
      the "LEVINE'S TEST FOR VARIANCES" comes up multiple times...I need to suck out the column that [in this example] has 0.5422 in it each time [a line has TEST FOR VARIANCES]
        My interpretation
        When it comes to problem description, I try to avoid as much reading between the lines as possible. Specially when it comes to regular expressions. Sure, I could have given him a regex according to some interpretation. Would that have solved his immediate problem? Perhaps, if my interpretation happened to be right. Would he have learned anything, that is, would he have been able to solve his next regex problem? Probably not.

        No, instead I try him to think again. To look at the problem - poke at it, disect it, and formulate what he really wants. Formulate a regexp problem properly, and you have solved it for 90% - all you need to do is the translation to an arcane language - but that's fairly mechanical at that point.

        In Dutch we have a saying "Zachte heelmeesters maken stinkende wonden", which translated means something like "Gentle healers make festering wounds".

        Abigail

Re: regular expressions question
by fourmi (Scribe) on May 04, 2004 at 12:55 UTC
    hey, i wouldn't use perl for this i'd just use a unix shell.. un-optimised version follows
    cat <textfilename>|grep "LEVINE'S TEST FOR VARIANCES"|awk {'print $9'}
    (prints the 9th block of text - blocks are seperated by whitespace)