Lori713 has asked for the wisdom of the Perl Monks concerning the following question:

I finally had my first Perl class (three days of wonderful bliss. Yay!). Anyways, we were working on creating regexes to find different things, and the last regex block in my code (finding a negative number with a decimal) stumped everyone (even the teacher at the time). We wanted the regex to find all but the last line in my text file. My last regex finds the last six lines of my text file, but I don't want it to find the last line. Assuming that the "abc" in that last line could be anything, how do I exclude it if it doesn't follow the same pattern as the examples above it and not eliminate stuff I do want?
#! c:\Perl\bin\perl5_8 #Read lines from a text file; create RE's that will find certain stuff #use strict; take this out for now open (MYFILE, "< Lori.txt") || die("Can't open Lori.txt"); #find lines starting with a c and end with a d print "lines starting with a c and end with a d:\n"; while (chomp($line =<MYFILE>)) { if ($line =~ /^[cC].*d$/o) { print "$line\n"; } } #find blank lines seek MYFILE, 0,0; print "\nblank lines:\n"; while (chomp($line =<MYFILE>)) { if ($line =~ /^$/o) { print "$line\n"; print "blank line above\n"; } } #find lines with only spaces seek MYFILE, 0,0; print "\nlines with only spaces:\n"; while (chomp($line =<MYFILE>)) { if ($line =~ /^ *$/o) { print "$line\n"; print "spaced out line above\n"; } } #find lines that contain the same number twice seek MYFILE, 0,0; print "\nlines that contain the same number twice:\n"; while (chomp($line =<MYFILE>)) { if ($line =~ /(\d+)[^\d]*\1[^\d]/o) { print "$line\n"; } } #find lines that contain a negative number with a decimal point seek MYFILE, 0,0; print "\nlines that contain a - number with a decimal point:\n"; while (chomp($line =<MYFILE>)) { if ($line =~ /-\d*\.\d*/o) { print "$line\n"; } } close MYFILE;

And my test text file Lori.txt:

Catdog isn't a horsebird
The next line is a blank line.

The next line is a line full of spaces.

fred1 fred2 fred3 fred3 fred2
lucy1 lucy2 lucy3 lucy4 lucy5
negative number with a decimal: -15.00
negative number with a decimal: -.15
positive number with a decimal: .15
negative number with a decimal: -7.
negative sign with a decimal: -.

Thanks for any insights you can provide. Please feel free to suggest ways of improving my code.    :-)

Lori

Replies are listed 'Best First'.
Re: Regex negative number question
by dragonchild (Archbishop) on Oct 06, 2003 at 14:45 UTC
    You need to add an additional requirement that there actually is a number.
    if ($line =~ /-\d*\.\d*/o) becomes if ($line =~ /-\d*\.\d*/o && $line =~ /\d/o)

    That said, check out Regexp::Common - it has a ton of regexes that are a lot more complicated than most people will ever understand, but are extremely useful. Such as, recognizing a number that's in pretty much any format (including scientific).

    And, congratz on the Perl class. You've been wanting that for a while. I'm glad you enjoyed yourself. Next step - answering questions on PM! :-)

    ------
    We are the carpenters and bricklayers of the Information Age.

    The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

      You test would still, for example, match "birdie3 -.", which just illustrates why to use Regexp::Common, though ;-).
      Cheers,
      CombatSquirrel.
      Entropy is the tendency of everything going to hell.
        Regexp::Common doesn't have a regexp for negative numbers, but you can easily add that yourself. For instance, if you want to match negative decimal numbers, use:
        use Regexp::Common; /(?=-)$RE{num}{decimal}/
        The (?=-) requires the match to start with a minus sign.

        Abigail

      This would match a line like that:
      -. 0
      Which is not the desired result I believe.
Re: Regex negative number question
by delirium (Chaplain) on Oct 06, 2003 at 14:47 UTC
    You should check for a number before or after the decimal like so:

    if ( $line =~ /-\d+\./ || $line =~ /-\d*\.\d+/ )

    -. won't match.

Re: Regex negative number question
by dragonchild (Archbishop) on Oct 06, 2003 at 15:32 UTC
    To answer the question you asked by /msg, try the following:
    while (my $line = <MYFILE>) { chomp $line; # Do stuff here }

    The problem is that $line was a global variable. strict complains about that, so you have to my the variable when you use it. The above is (almost) the standard way of handling a file line by line. The standard way most people do it would be to use $_ as much as possible. Something along the lines of:

    while (<MYFILE>) { # Skip commented-out lines (or whatever else you want to do) next if /^#/; # Don't chomp unless we know we're going to use the line chomp; if (/SOME REGEX HERE/) { # Do stuff } }

    Now, you don't want to chomp in the while condition. If you have a last line without a newline, that line will be skipped because chomp will indicate that it wasn't able to remove anything by returning undef.

    ------
    We are the carpenters and bricklayers of the Information Age.

    The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: Regex negative number question
by fletcher_the_dog (Friar) on Oct 06, 2003 at 17:38 UTC
    Try:
    $line=~/- # negative (?: # followed by \d+(?:\.\d+)? # digits possible followed by '.' digits | # or \.\d+ # '.' digits )/x; # with out comments looks like this $line=~/-(?:\d+(?:\.\d+)|\.\d+)/

      But then that rules out '-7.', which the OP explicitly wants to include.

      You could try this (leaving out the non-capturing ?:s for clarity):

      print "match" if $str =~ /-\d*((?<=\d)\.)|(\.(?=\d))/;

      but then that would match if:

      my $str = 'See answers on pages 234-235.'

      oh well...

      dave

Re: Regex negative number question
by zby (Vicar) on Oct 06, 2003 at 15:18 UTC
    As an additional note:
    #find lines that contain the same number twice ... if ($line =~ /(\d+)[^\d]*\1[^\d]/o)
    This will not match "1 2 1", but this line contains the number "1" twice.
Re: Regex negative number question
by Cody Pendant (Prior) on Oct 06, 2003 at 21:09 UTC
    Nothing to contribute except to ask why your regexes end in "/o" -- did you teacher tell you to do that?

    I can't see the reason for it.



    ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print
Re: Regex negative number question
by Lori713 (Pilgrim) on Oct 07, 2003 at 19:12 UTC
    Thanks for the feedback and suggestions, everyone. To answer a couple of the questions asked by some:

    zby: True, I wouldn't want to match "-. o".
    Cody Pendant: That's what the teacher gave us. You're right: it doesn't seem to have a purpose in this context.