pseudosocrates has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, just a quick one that's driving me mad, but is prolly just a result of it being 11am.
In the following code snippet, it works fine if the sort is 'cmp' and the {n} is 1,3,4,6. When I switch it to <=> and {n} to 2,5 it fails. What is it about the numerical sort that is killing my regex? Help appreciated.
@lines = ( "0|aa aa|1998|aaa a|a aaa|10|aa a aa", "1|bbb aa|1992|fa a|gaa|5|gfsa aa", "2|aa ba|1997|afa|hhaa|1|asdf aa", "3|cccaa|1997|ssa s|hhava|3|gfdh gaa", "4|adaa|1994|g a a|jiua|6|angf a" ); @lines = sort {lc(($b =~ /(\|[\w\s]+){2}/)[0]) <=> lc(($a =~ /(\|[\w\s +]+){2}/)[0]) } @lines; foreach $line (@lines){print "$line\n";}

Replies are listed 'Best First'.
Re: Just a regex quickie
by dragonchild (Archbishop) on Jun 02, 2004 at 15:08 UTC
    Why are you trying to numerically sort stuff that you feel you have to lowercase first? I would strongly recommend that you rewrite your sort as a Schwartzian Transform. Something along the lines of:
    @lines = map { $_->[1], } sort { $b->[0] <=> $a->[0] } map { [ SORTBY_VALUE_HERE, $_ ] } @lines;

    So, you would put your algorithm for determining the SORTBY_VALUE in the lower map-statement. This is not only more efficient, but it's also more descriptive of what you're actually trying to do. Your solution has the look of a bunch of cancerous tumors that have grown one atop the other. You're closing your eyes and stabbing at a solution. The amazing thing is how far you've gotten.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

    I shouldn't have to say this, but any code, unless otherwise stated, is untested

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Just a regex quickie
by fizbin (Chaplain) on Jun 02, 2004 at 15:41 UTC

    The basic problem is that your regex is capturing the leading pipe, so that to <=> you end up passing '|10' and '|5'. Not surprisingly, these values are both numerically the same as 0.

    Of course, the extra leading pipe character doesn't affect the result of a cmp operation.

    Changing your code as little as possible, this will do what you want - though I must say that the entire code design still makes me recoil, and I'd jump to a transform-sort-untransform structure as the other commentor is pointing out:

    @lines = sort {($b =~ /(?:\|([\w\s]+)){2}/)[0] <=> ($a =~ /(?:\|([\w\s +]+)){2}/)[0] } @lines;

    By the way, I discovered what was going on with judicious use of Data::Dumper:

    $b = "0|aa aa|1998|aaa a|a aaa|10|aa a aa"; @a = ($b =~ /(\|[\w\s]+){2}/); use Data::Dumper; print Dumper(\@a);
    Whenever there's something weird going on in my perl code, I end up sticking references to Data::Dumper in all over the place.
    -- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/
      Thanks for the courteous reply. I realise my structure may me a little ugly, and I will have a look at the alternate suggestion for when I'm doing something with a little more data.
      Data::Dumper sounds like just what I need. As you might be able to tell, my regexes often spit out weird results :-)
Re: Just a regex quickie
by Not_a_Number (Prior) on Jun 02, 2004 at 16:25 UTC

    Here's one way to use an ST to sort your data:

    use strict; use warnings; my @lines = ( "0|aa aa|1998|aaa a|a aaa|10|aa a aa", "1|bbb aa|1992|fa a|gaa|5|gfsa aa", "2|aa ba|1997|afa|hhaa|1|asdf aa", "3|cccaa|1997|ssa s|hhava|3|gfdh gaa", "4|adaa|1994|g a a|jiua|6|angf a" ); my $field = 5; @lines = map { join '|', @$_ } sort { $b->[$field] <=> $a->[$field] } map { [ split /\|/ ] } @lines; print "$_\n" for @lines;

    dave