Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:

I had to sort filenames with leading digits the other day, for instance "01something.htm" and "10something.htm".

What happens when you do a default sort on them?

The second one comes first, because they're being lexically sorted and 0 is before s, or whatever.

So I sort them numerically.

And strict complains, because the arguments aren't numeric.

But the cool thing is that it gets it right anyway.

Here's an example:

use strict; my @x = qw( 1xxx 2xxx 300x 10xx ); @x = sort @x; print "default sort: @x\n"; @x = sort {$a <=> $b} @x; print "numeric sort: @x\n";

Which will hopefully do the same for you as happened for me. Strict will complain, but sort will happen just fine.

And then how about this, which seemed like something I should try:

@x = sort {$a =~/^d+/ <=> $b =~/^d+/} @x; print "regex sort: @x\n";

That actually came up with the 'right' answer -- numeric sorting, as well, plus no complaints from strict, but I have to admit I have no idea what I'm doing.

Do I have a question? I'm not sure I do. Unless it's "how do I sort, numerically, things which are not entirely numeric, without getting lots of error messages?".
--

“Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.”
M-J D

Replies are listed 'Best First'.
Re: Sort Says "not numeric" then sorts numeric anyway?
by gmax (Abbot) on Feb 11, 2003 at 07:40 UTC
    And strict complains, because the arguments aren't numeric.
    Actually, it's warnings that is complaining.
    If you turn stricts off and keep warnings on, you get the message, which is, appropriately enough, a "warning."
    When strict complains, it usually means that your program can't execute at all.
    #!/usr/bin/perl -w use strict; my @x = qw( 1xxx 2xxx 300x 10xx ); @x = sort @x; print "default sort: @x\n"; @x = sort {local $^W; $a <=> $b} @x; print "numeric sort: @x\n";
    Transforming the comparison elements with int($a) won't keep warnings silent either. Turning off warnings temporarily gets rid of the messages, but I wouldn't recommend it. Warnings are useful because they tell you that something is wrong and needs your attention.

    Therefore I would go for the third solution.
    However, your regex is not working. It seems to be doing something in your particular program, because the array was already sorted by the previous statement.
    Try this.
    @x = qw( 5xxx 2xxx 300x 10xx ); @x = sort {$a =~/^d+/ <=> $b =~/^d+/} @x; # WRONG print "regex sort: @x\n";
    And your array is not sorted at all.
    A better sorting should be
    @x = qw( 5xxx 2xxx 300x 10xx ); @x = sort {my ($y) = $a =~/^(\d+)/; my ($z) = $b =~/^(\d+)/; $y <=> $z} @x; print "regex sort: @x\n";
    _ _ _ _ (_|| | |(_|>< _|
      You can simplify your - correctly working - sort a bit to: @x = sort { ($a =~/^(\d+)/)[0] <=> ($b =~/^(\d+)/)[0] } @x;The extra parens together with the access to the first element ([0]) force the regex into list context, which then returns the captured $1.

      As an additonal note: Depending on the size of the array, a Schwartzian Transform might be recommended instead.

      -- Hofmator

      Turning off warnings temporarily gets rid of the messages, but I wouldn't recommend it. Warnings are useful because they tell you that something is wrong and needs your attention.

      No, a warning doesn't say something is wrong. If something is wrong, you get an error. The warnings something may be wrong. Big difference. If the given case, sorting strings that starts with numbers numerically, the easiest, and IMO the right thing to do is to turn the warnings off.

      Going out of your way to avoid a warning to happen defeats the purpose of having a fine-grade, lexically tuneable warning system.

      Abigail

Re: Sort Says "not numeric" then sorts numeric anyway?
by FoxtrotUniform (Prior) on Feb 11, 2003 at 06:43 UTC

    I'm pretty sure that what's going on is that the $a <=> $b is forcing its arguments to intify (which they do by prefix, or to 0 if no numeric prefix is found). As you've found, this is often the Right Thing(tm). You can shut warnings.pm up about it (and make it clear to anyone reading your code that you intend to sort them numerically) by using int($a) <=> int($b); int() seems to work just as happily on strings as on reals, and since it returns a number, <=> is happy.

    Update: Magical Mystery String Numerification (for values of Magical Mystery equal to "what atof(3) does on this platform") is explained on p. 59 of the 3rd Edition Camel.

    Update II: s/strict/warnings/ Thanks jryan!

    --
    F o x t r o t U n i f o r m
    Found a typo in this node? /msg me
    The hell with paco, vote for Erudil!

      Strict doesn't cause the warning.

      If it really bothers you, you can shut the warning up by turning the warning off:

      use strict; use warnings; my @x = qw( 1xxx 2xxx 300x 10xx ); @x = sort @x; print "default sort: @x\n"; no warnings 'numeric'; @x = sort {$a <=> $b} @x; print "numeric sort: @x\n";
Re: Sort Says "not numeric" then sorts numeric anyway?
by elbow (Scribe) on Feb 11, 2003 at 08:23 UTC
    That actually came up with the 'right' answer -- numeric sorting, as well, plus no complaints from strict, but I have to admit I have no idea what I'm doing.
    To clarify what you were attempting to do, and what gmax's code does do, you are getting the regex to pull off the digits at the start of each filename and sorting on that alone.

    elbow
      you are getting the regex to pull off the digits at the start of each filename and sorting on that alone

      No, that's not what Cody Pendant's regex does. His regex (/^d+/) looks for start of string, followed by at least one 'd' - no digit, that would be \d. And the way he uses it means that he makes a numerical compare on the truth value returned by the regex. The <=> forces scalar context, so the regex evaluates to true or false.

      -- Hofmator

Re: Sort Says "not numeric" then sorts numeric anyway?
by Cody Pendant (Prior) on Feb 11, 2003 at 21:26 UTC
    Thanks everyone. I'm kicking myself about that regex thing. I think I was kind of halfway there, but yeah how dumb not to realise it was staying sorted, not being sorted.

    I'm guessing that what my silly regex-sort did was return a "1" or a "true" for both sides of the comparison -- "yes, there are digits at the start", and so everying was "is-one-bigger-or-smaller-than-one" and nothing happened?
    --

    “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.”
    M-J D