in reply to Re: regex behaves differently in split vs substitute?
in thread regex behaves differently in split vs substitute?

That's fine, but I recommend avoiding $1, $2, etc. If you put the left-hand-side in a list context, a variable like $version can be assigned directly without fiddling with $1 as an intermediary. For most folks, $version is easier to understand than just $1.

Your if() statement is correct, a successful match will return a true/false value. However an assignment to $version like below will return a "defined" or "not defined" value which can also be used in an "if".

chomp if you like, but adding \s*$ includes \n in the regex (no need for chomp). chomp is "not expensive", but once we whip out the nuclear weapon of regex, asking it to throw away any trailing white space is no big deal.

use strict; use warnings; while (my $line = <DATA>) { my ($version) = $line =~ /^[a-z-]+(\d.*)\s*$/; print ">>$version<<\n" if $version; } =PRINTS: >>2.10<< >>2.10.2-r1<< >>2.10.5<< =cut __DATA__ mono-basic-2.10 mono-2.10.2-r1 mono-2.10.5

Replies are listed 'Best First'.
Re^3: regex behaves differently in split vs substitute?
by Anonymous Monk on Oct 08, 2011 at 13:36 UTC
Re^3: regex behaves differently in split vs substitute?
by Anonymous Monk on Oct 08, 2011 at 13:19 UTC

    Eeeeeeewwwwww :P

    #!/usr/bin/perl -- use strict; use warnings; my $dita = <<'__DITA__'; mono-basic-2.10 mono-2.10.2-r1 mono-2.10.5 __DITA__ open my $data => '<', \$dita or die $!; while( my $line = <$data> ){ if( my ($version) = $line =~ /^[a-z-]+(\d.*)\s*$/ ){ print ">>$version<<\n" } } __END__ >>2.10<< >>2.10.2-r1<< >>2.10.5<<
      That is exactly the same. It makes no difference if the "if" is written before the match or on the same line as the print. If you like your formulation better, then do it.

      The main idea of my reply to leslie was to avoid $1. The rationale is simple: $version is easier to understand than $1. This "if" before or at the end of the print is a side show, tangential to the point - makes no difference.

      In general try to avoid $1 because almost surely there is some better name! Even if I am reading my own code one year later, I don't want to read the regex to see that I'm capturing digits...I'll see that I've got $version from the regex and my eyeballs keep moving in the code.

        That is exactly the same... tangential to the point.

        You see Eeeeeeewwwwww :P is synonym for style tangent :)

        Yes, named variables make more sense, if you're going to use them for something

        If all you're going to do is print the match, you might as well use $1