zaro has asked for the wisdom of the Perl Monks concerning the following question:
Hello monks,
I have the following subroutine which I use with sort:
my $mask=/_(\d\d)_/;
sub sorter ($$){
my ($am,$bm);
$_[0] =~ /$mask/;
$am=$1;
'' =~ /()/; # A match that always succeeds
$_[1] =~ /$mask/;
$bm=$1;
$am <=> $bm || $am cmp $bm;
}
@result = sort sorter @f;
The idea is to sort a list of strings based on part of the string. So I use two matches for each string passed on sorter function, but if the first match succeeds and the seconds doesn't the $1 variable is still holding the result of the fist match as if the second match never happened. If both matches are successfull everything is working as expected but if the first one matches and the second don't the second match dont affect the backreferences in $1. I found a wrokaraoud for this by placing an always succeeding empty match between the matches but I wonder is this normal behaviour?
Can anyone explain it?
Zaro
Re: Regular expressions and sort
by Fletch (Bishop) on Nov 04, 2005 at 13:56 UTC
|
Erm, why not just use an ST and do the matching once and not worry about clobbering $1 et al. More efficient to boot.
my @result = map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { /_(\d{2})_/; [ $_, $1 ] } @f;
| [reply] [d/l] [select] |
|
Thanks, your solution seems better. I will use it :-).
But it still doesn't explain the strange behaviour of m//.
| [reply] |
|
| [reply] [d/l] |
Re: Regular expressions and sort
by japhy (Canon) on Nov 04, 2005 at 14:48 UTC
|
No one seems to have mentioned that my $mask=/_(\d\d)_/; is Not the Code You Are Looking For. To put a regex in $mask, use qr// like so: my $mask = qr/_(\d\d)_/;
| [reply] [d/l] [select] |
|
| [reply] |
|
No, not "nothing". The result of $_ =~ /_(\d\d)_/, a boolean value, gets stored.
| [reply] [d/l] |
Re: Regular expressions and sort
by Aristotle (Chancellor) on Nov 04, 2005 at 14:11 UTC
|
You should always, always either combine capturing patterns with conditionals or use list assignment to pull out captures.
In your case, that means I’d write the routine in either of these ways:
sub sorter ($$) {
my $am = ( $_[0] =~ /$mask/ ) ? $1 : '';
my $bm = ( $_[1] =~ /$mask/ ) ? $1 : '';
$am <=> $bm || $am cmp $bm;
}
# or
sub sorter ($$) {
my ( $am ) = ( $_[0] =~ /$mask/ );
my ( $bm ) = ( $_[1] =~ /$mask/ );
$am <=> $bm || $am cmp $bm;
}
# note: these are not quite equivalent
# after a failed match, $am and $bm will contain:
# #1) an empty string
# #2) undef
Never use $1 when you’re not checking for a pattern match’s success.
Makeshifts last the longest. | [reply] [d/l] |
Re: Regular expressions and sort
by robin (Chaplain) on Nov 04, 2005 at 14:16 UTC
|
Yes, this is perfectly normal. A failing match doesn't clobber the existing match variables ($1, $2, etc).
If you want to do what you're doing, it's probably better to use something like
my ($am) = ($_[0] =~ /$mask/);
my ($bm) = ($_[1] =~ /$mask/);
though I'd be strongly inclined to lose the ($$) prototype and use $a and $b rather than $_[0] and $_[1]. As well as being more readable, it's also more efficient. (OTOH if efficiency really matters, you may as well follow Fletch's suggestion above.)
I'm baffled by your $am <=> $bm || $am cmp $bm. You know that $am and $bm are either empty or two-digit numbers, so what's the point in cmping them?
| [reply] [d/l] |
Re: Regular expressions and sort
by Perl Mouse (Chaplain) on Nov 04, 2005 at 14:31 UTC
|
To answer your question: yes, that is normal behaviour. $1 and friends are only set on a succesful match - if the match fails, $1 and friends keep their value. You could do:
$am = $1 if $_[0] =~ /$mask/;
$bm = $1 if $_[1] =~ /$mask/;
Or, to avoid warnings:
($am, $bm) = map {/$mask/ ? $1 : 0} @_;
Having said that, I always try to avoid using a sort-sub, and often a sort-block as well. Instead, I would opt for a GRT in this case:
@result = map {substr $_, 2}
sort
map {sprint "%02d%s", (/$mask/ ? $1 : 0), $_} @f;
Using a GRT means you do the extraction of the key you want to sort on only once per element, instead of once per comparison. If you have to sort a long array, this can be significant.
| [reply] [d/l] [select] |
|
|