Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Regular expressions and sort

by zaro (Novice)
on Nov 04, 2005 at 13:47 UTC ( #505696=perlquestion: print w/replies, xml ) Need Help??

zaro has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I have the following subroutine which I use with sort:
my $mask=/_(\d\d)_/; sub sorter ($$){ my ($am,$bm); $_[0] =~ /$mask/; $am=$1; '' =~ /()/; # A match that always succeeds $_[1] =~ /$mask/; $bm=$1; $am <=> $bm || $am cmp $bm; } @result = sort sorter @f;
The idea is to sort a list of strings based on part of the string. So I use two matches for each string passed on sorter function, but if the first match succeeds and the seconds doesn't the $1 variable is still holding the result of the fist match as if the second match never happened. If both matches are successfull everything is working as expected but if the first one matches and the second don't the second match dont affect the backreferences in $1. I found a wrokaraoud for this by placing an always succeeding empty match between the matches but I wonder is this normal behaviour?

Can anyone explain it?


Replies are listed 'Best First'.
Re: Regular expressions and sort
by Fletch (Bishop) on Nov 04, 2005 at 13:56 UTC

    Erm, why not just use an ST and do the matching once and not worry about clobbering $1 et al. More efficient to boot.

    my @result = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { /_(\d{2})_/; [ $_, $1 ] } @f;
      Thanks, your solution seems better. I will use it :-).
      But it still doesn't explain the strange behaviour of m//.

        No, because perldoc perlre explains that:

        The numbered variables ($1, $2, $3, etc.) and the related punctuation set ($+, $&, $`, $', and $^N) are all dynamically scoped until the end of the enclosing block or until the next successful match, whichever comes first. (See "Compound Statements" in perlsyn.)
Re: Regular expressions and sort
by japhy (Canon) on Nov 04, 2005 at 14:48 UTC
    No one seems to have mentioned that my $mask=/_(\d\d)_/; is Not the Code You Are Looking For. To put a regex in $mask, use qr// like so: my $mask = qr/_(\d\d)_/;

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
      I was also eyeballing the statement $mask=/_(\d\d)_/ . After some investigation with use re 'debugcolor', it turns out that the /_(\d\d)_/ gets compiled but nothing is stored in $mask. In effect, the statement

      $_[0] =~ /$mask/ is equal to $_[0] =~ //

        No, not "nothing". The result of $_ =~ /_(\d\d)_/, a boolean value, gets stored.
Re: Regular expressions and sort
by Aristotle (Chancellor) on Nov 04, 2005 at 14:11 UTC

    You should always, always either combine capturing patterns with conditionals or use list assignment to pull out captures.

    In your case, that means I’d write the routine in either of these ways:

    sub sorter ($$) { my $am = ( $_[0] =~ /$mask/ ) ? $1 : ''; my $bm = ( $_[1] =~ /$mask/ ) ? $1 : ''; $am <=> $bm || $am cmp $bm; } # or sub sorter ($$) { my ( $am ) = ( $_[0] =~ /$mask/ ); my ( $bm ) = ( $_[1] =~ /$mask/ ); $am <=> $bm || $am cmp $bm; } # note: these are not quite equivalent # after a failed match, $am and $bm will contain: # #1) an empty string # #2) undef

    Never use $1 when you’re not checking for a pattern match’s success.

    Makeshifts last the longest.

Re: Regular expressions and sort
by robin (Chaplain) on Nov 04, 2005 at 14:16 UTC

    Yes, this is perfectly normal. A failing match doesn't clobber the existing match variables ($1, $2, etc).

    If you want to do what you're doing, it's probably better to use something like

    my ($am) = ($_[0] =~ /$mask/); my ($bm) = ($_[1] =~ /$mask/);
    though I'd be strongly inclined to lose the ($$) prototype and use $a and $b rather than $_[0] and $_[1]. As well as being more readable, it's also more efficient. (OTOH if efficiency really matters, you may as well follow Fletch's suggestion above.)

    I'm baffled by your $am <=> $bm || $am cmp $bm. You know that $am and $bm are either empty or two-digit numbers, so what's the point in cmping them?

Re: Regular expressions and sort
by Perl Mouse (Chaplain) on Nov 04, 2005 at 14:31 UTC
    To answer your question: yes, that is normal behaviour. $1 and friends are only set on a succesful match - if the match fails, $1 and friends keep their value. You could do:
    $am = $1 if $_[0] =~ /$mask/; $bm = $1 if $_[1] =~ /$mask/;
    Or, to avoid warnings:
    ($am, $bm) = map {/$mask/ ? $1 : 0} @_;

    Having said that, I always try to avoid using a sort-sub, and often a sort-block as well. Instead, I would opt for a GRT in this case:

    @result = map {substr $_, 2} sort map {sprint "%02d%s", (/$mask/ ? $1 : 0), $_} @f;
    Using a GRT means you do the extraction of the key you want to sort on only once per element, instead of once per comparison. If you have to sort a long array, this can be significant.
    Perl --((8:>*

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://505696]
Approved by tbone1
Front-paged by ww
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (5)
As of 2023-05-30 10:37 GMT
Find Nodes?
    Voting Booth?

    No recent polls found