in reply to Sorting Numbers & Text

Do the sort using a Schwartzian Transform by storing an indicator on whether the value is alpha or numeric as well as the value itself. The outer sort is alpha then numeric and the inner sort comparator is chosen in a ternary depending on the indicator.

knoppix@Microknoppix:~$ perl -E ' > my @values = qw{ > 041351920234 > Rabbit > 0343120 > 041271024500 > 0430870 > Apple > 041460301399 > }; > > say for > map { $_->[ 0 ] } > sort { > $a->[ 1 ] <=> $b->[ 1 ] > || > $a->[ 1 ] > ? $a->[ 0 ] <=> $b->[ 0 ] > : $a->[ 0 ] cmp $b->[ 0 ] > } > map { [ $_, m{^\d} ? 1 : 0 ] } > @values;' Apple Rabbit 0343120 0430870 041271024500 041351920234 041460301399 knoppix@Microknoppix:~$

I hope this is helpful.

Cheers,

JohnGG

Replies are listed 'Best First'.
Re^2: Sorting Numbers & Text
by frozenwithjoy (Priest) on Jul 12, 2012 at 21:36 UTC
    I get the same result with:
    #!/usr/bin/env perl use strict; use warnings; use Data::Printer; my @unsorted = qw( 041351920234 Rabbit 0343120 041271024500 0430870 Ap +ple 041460301399); my @sorted = sort { $a <=> $b || $a cmp $b } @unsorted; p @unsorted; p @sorted;

    Is your way more efficient memory-wise? Also, both of our approaches are giving me warnings like: Argument "Rabbit" isn't numeric in numeric comparison (<=>) at....

      It seems that the warnings are due to a precedence problem that can be fixed with parentheses,

      knoppix@Microknoppix:~$ perl -Mstrict -Mwarnings -E ' > my @values = qw{ > 041351920234 > Rabbit > 0343120 > 041271024500 > 0430870 > Apple > 041460301399 > }; > > say for > map { $_->[ 0 ] } > sort { > ( $a->[ 1 ] <=> $b->[ 1 ] ) > || > ( > $a->[ 1 ] > ? $a->[ 0 ] <=> $b->[ 0 ] > : $a->[ 0 ] cmp $b->[ 0 ] > ) > } > map { [ $_, m{^\d} ? 1 : 0 ] } > @values;' Apple Rabbit 0343120 0430870 041271024500 041351920234 041460301399 knoppix@Microknoppix:~$

      Thanks for pointing it out.

      Cheers,

      JohnGG

Re^2: Sorting Numbers & Text
by PriNet (Monk) on Jul 12, 2012 at 21:40 UTC
    I start with a "new" value that isn't in the "list(file)" yet (either alpha or numeric), then start reading out the values one at a time from the sourcefile, compare the two, print the "smallest" back to a tempfile while retaining the largest for the next read... and so on... (kind-of-a-bubble-sort) then swap the files back at the end. its the "comparing" of mixed types that i get stuck on. and i'm worried about running out of memory if i just load into array and <=> sort function (which i may end up doing *heh*) but you have given me an idea about "flagging" the first character as alpha or numeric ... *hmmmm*

    I did try re-inventing the wheel...
    But it kept getting stuck on the corners

      Another approach might be to read chunks of your very large database, perhaps 100k to 500k records at a time, and sort each chunk into its own temporary file. Once you have read and sorted all of the data, do a sort/merge of the temporary files into a final sorted file. My gut feeling is that this would be more efficient than the "two at a time" approach you are taking.

      Cheers,

      JohnGG

      Thinking about it further, read a chunk of your database then sort and print to two temporary files, one for letters, one for numbers. Then the sort/merge of the temporary files will be simpler keeping the two categories separate. Finally you can concatenate the letters and numbers merged files for your results file. In this code I am writing to in-memory scalars rather than disk files just to keep things tidy.

      knoppix@Microknoppix:~$ perl -Mstrict -Mwarnings -E ' > my @values = qw{ > 041351920234 > Rabbit > 0343120 > 041271024500 > 0430870 > Apple > 041460301399 > }; > > my $rsLets = do { \ my $lets }; > open my $letsFH, q{>}, $rsLets or die $!; > my $rsNums = do { \ my $nums }; > open my $numsFH, q{>}, $rsNums or die $!; > > say { $_->[ 1 ] ? $numsFH : $letsFH } $_->[ 0 ] for > sort { > ( $a->[ 1 ] <=> $b->[ 1 ] ) > || > ( > $a->[ 1 ] > ? $a->[ 0 ] <=> $b->[ 0 ] > : $a->[ 0 ] cmp $b->[ 0 ] > ) > } > map { [ $_, m{^\d} ? 1 : 0 ] } > @values; > > say ${ $rsLets }, q{-----------------}; > say ${ $rsNums }, q{-----------------};' Apple Rabbit ----------------- 0343120 0430870 041271024500 041351920234 041460301399 ----------------- knoppix@Microknoppix:~$

      I hope this is of interest.

      Cheers,

      JohnGG