in reply to difference in regex
You will find the answer to your question in "Regexp Quote-Like Operators" in perlop - basically, different regex operations have different return values in different contexts. See also perlretut for a tutorial.
Operation | Context | () Capturing Groups |
Return Value on Match (and notes on behavior) |
Return Value on Failure | Example |
---|---|---|---|---|---|
m// | scalar | - | true | false |
|
m//g | scalar | - | true (each execution of m//g finds the next match, see "Global matching" in perlretut) |
false if there is no further match |
|
m// | list | no | the list (1) | the empty list () |
|
m//g | list | no | a list of all the matched strings, as if there were parentheses around the whole pattern | the empty list () |
|
m// | list | yes | a list consisting of the subexpressions matched by the parentheses in the pattern, that is, ($1, $2, $3...) | the empty list () |
|
m//g | list | yes | a list of the substrings matched by any capturing parentheses in the regular expression, that is, ($1, $2...) repeated for each match | the empty list () |
|
s/// | - | - | the number of substitutions made | false |
|
s///r | - | - | a copy of the original string with substitution(s) applied (available since Perl 5.14) |
the original string |
|
In this table, "true" and "false" refer to Perl's notion of Truth and Falsehood. Remember not to rely on any of the capture variables like $1, $2, etc. unless the match succeeds!
In my $foo = "bar"=~/a/;, the right-hand side of the assignment ("bar"=~/a/) is in scalar context. In my ($foo) = "bar"=~/a/; or my @foo = "bar"=~/a/;, the right-hand side is in list context. That's why, in your example, you need those parens in ($value): because you want the matching operation to return the contents of the capture group.
Note that your expressions can be slightly simplified, not all the parens you showed are needed:
my ($value) = $row =~ /.*,(.*)/; # and $row =~ s/,[^,]*$//;
A few additional comments on your code:
- ($row =~ s/,[^,]*$//); # gets substring before the last comma - this comment isn't quite right or at least potentially misleading, since it deletes the string
beforeafter and including the last comma. - /.*,(.*)/ matches any comma anywhere in the string, for simple input strings it may behave correctly, but I'd strongly recommend coding more defensively and writing it like your second expression: my ($value) = $row=~/,([^,]*)$/; - the $ anchor makes sure that the regex only matches the last comma and what follows it (unless you use the /m modifier, since it changes the meaning of $).
- While the use of Scalar::Util's looks_like_number is often a good idea, note that if you don't mind being a little more restrictive, Regexp::Common (or a hand-written regex) would allow you to combine the two regular expressions:
use Regexp::Common qw/number/; my $row = "a,b,c,d,15"; if ( $row=~s/,($RE{num}{real})$// ) { print "matched <$1>\n"; } print "row is now <$row>\n"; __END__ matched <15> row is now <a,b,c,d>
- If this is a CSV file, consider using Text::CSV (also install Text::CSV_XS for speed)
Update: Added s///r to the table and added a few more doc links. A few other edits and updates. 2019-02-16: Added "Return Value on Failure" column to table, and a few other small updates. 2019-08-17: Updated the link to "Truth and Falsehood".
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: difference in regex
by ovedpo15 (Pilgrim) on May 29, 2018 at 14:11 UTC | |
by haukex (Archbishop) on May 29, 2018 at 14:30 UTC |