in reply to regex behaves differently in split vs substitute?

You can use this below code for extracting the version

use strict; use warnings; while (my $line = <DATA>) { chomp $line; if ($line =~ /^[a-z-]+(\d.*)$/) { print ">>$1<<\n"; } } __DATA__ mono-basic-2.10 mono-2.10.2-r1 mono-2.10.5

Replies are listed 'Best First'.
Re^2: regex behaves differently in split vs substitute?
by Marshall (Canon) on Oct 08, 2011 at 12:58 UTC
    That's fine, but I recommend avoiding $1, $2, etc. If you put the left-hand-side in a list context, a variable like $version can be assigned directly without fiddling with $1 as an intermediary. For most folks, $version is easier to understand than just $1.

    Your if() statement is correct, a successful match will return a true/false value. However an assignment to $version like below will return a "defined" or "not defined" value which can also be used in an "if".

    chomp if you like, but adding \s*$ includes \n in the regex (no need for chomp). chomp is "not expensive", but once we whip out the nuclear weapon of regex, asking it to throw away any trailing white space is no big deal.

    use strict; use warnings; while (my $line = <DATA>) { my ($version) = $line =~ /^[a-z-]+(\d.*)\s*$/; print ">>$version<<\n" if $version; } =PRINTS: >>2.10<< >>2.10.2-r1<< >>2.10.5<< =cut __DATA__ mono-basic-2.10 mono-2.10.2-r1 mono-2.10.5

      Eeeeeeewwwwww :P

      #!/usr/bin/perl -- use strict; use warnings; my $dita = <<'__DITA__'; mono-basic-2.10 mono-2.10.2-r1 mono-2.10.5 __DITA__ open my $data => '<', \$dita or die $!; while( my $line = <$data> ){ if( my ($version) = $line =~ /^[a-z-]+(\d.*)\s*$/ ){ print ">>$version<<\n" } } __END__ >>2.10<< >>2.10.2-r1<< >>2.10.5<<
        That is exactly the same. It makes no difference if the "if" is written before the match or on the same line as the print. If you like your formulation better, then do it.

        The main idea of my reply to leslie was to avoid $1. The rationale is simple: $version is easier to understand than $1. This "if" before or at the end of the print is a side show, tangential to the point - makes no difference.

        In general try to avoid $1 because almost surely there is some better name! Even if I am reading my own code one year later, I don't want to read the regex to see that I'm capturing digits...I'll see that I've got $version from the regex and my eyeballs keep moving in the code.