Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Rule-based sorting

by 2501 (Pilgrim)
on Apr 10, 2001 at 03:49 UTC ( [id://71212]=note: print w/replies, xml ) Need Help??


in reply to Rule-based sorting

I would have liked to lean towards a map sort map option but i can't think of how i would handle criteria #3 with that solution. If somebody could think of a clean way to handle #3 that might be a nice way to go.

Replies are listed 'Best First'.
(tye)Re2: Rule-based sorting
by tye (Sage) on Apr 10, 2001 at 04:14 UTC

    Borrowing from I0, I offer this alternative:

    return map { if( s/^2// ) { $_; } elsif( /^([01])(\d\d\d)(.*)\Q$;\E(.*)$/ ) { $3 . substr(300-$2,-2) . $4 . "_" . ($str1,$str2)[$1] } } sort map { if( ! /(.*)(\d\d)(.*)_(?:\Q$string1\E|(\Q$string2\E))$/ ){ 2.$_; }else{ (0+defined($4)) . (200-$2) . $1 . $; . $3; } } @strs;

    which may be the fastest yet once you get enough elements to sort (and you have to set $; to something that won't be in your strings).

            - tye (but my friends call me "Tye")
handling #3
by stevens (Acolyte) on Apr 10, 2001 at 05:58 UTC
    I would have liked to lean towards a map sort map option but i can't think of how i would handle criteria #3 with that solution.

    #3 is the rule where differently-formatted lines (i.e., not conforming to the spec) come after all other lines, and are sorted alphabetically.

    To handle these, you could do two passes, and append the differently formatted output to the other output. Actually, I prefer to separate the code for handling the two cases, since conceptually you may need to treat them quite differently.

    Also, if they're in a different formats, your specs for the sort may diverge more in the future, so any combined sort would have to be split then anyway.

      I know I'm five years late for this party but I came across this node when Googling for Guttman/Rosler transforms and thought the problem was interesting. Criteria #3 can be handled as part of a transform by recording whether the match succeeded or failed as part of the list emitted by the first map. I insert a second map before sorting to either transform string1/string2 into 0/1 for matching lines to aid sorting or, for non-matching lines, transform the undefs from the failed match into zero or "" to silence warnings from the sort.
      #!/usr/local/bin/perl -w # use strict; # Set up the two strings that we want to match and sort to the top # of our list. # our $string1 = shift or die "No strings\n"; our $string2 = shift or die "Only one string\n"; # Print out our data lines sorted using a ST but with an extra map bef +ore # the sort to tweak the results of the regular expression match done i +n # first map. # print map {$_->[0]} # map out original line for print sort { $b->[1] <=> $a->[1] || # elem. 1 is success (1) or failure (0 +) of # match so descending numeric puts g +ood # lines first $a->[5] <=> $b->[5] || # elem. 5 is our string1/string2 ranki +ng so # ascending numeric gets string1s fi +rst $b->[3] <=> $a->[3] || # elem. 3 is two-digit number, highs f +irst $a->[2] cmp $b->[2] || # ascending alpha for string at beginn +ing ... $a->[4] cmp $b->[4] # ... and the one following the digits } map { # If match below succeeded replace elem. 5 of anon. list, which + is # either string1 or string2, with a zero if string1 or a one if # string2 so we can do a numeric ascending sort to get all the # string1s and string2s in the right place. # if($_->[1]) { splice @$_, 5, 1, $_->[5] eq $string1 ? 0 : 1; } # Else match did not succeed so replace elems. 2 thru 5, which +will # contain undef, with appropriate empty strings or zeros so tha +t the # sort does not complain about "Use of uninitialized value ..." + if # warnings are switched on. # else { splice @$_, 2, 4, q(), 0, q(), 0; } # Pass anon. list out of map to sort above. # $_; } map { [ # anon. list constructor $_, # elem. 0 is original line /(?x) # do match with extended syntax ^ # anchor to beginning of line (\S+)\s+ # capture first string, then space(s) (\d\d)\s+ # capture the two digits, then space(s +) (\S+)\s+_ # capture string after digits, then sp +ace(s) # followed by underscore ($string1|$string2) # capture either string1 or string2 $ # anchor to end of line / ? 1 : 0, # elem. 1 is success or failure of mat +ch $1, $2, $3, $4 # elems. 2 thru 5 are our captives ] # close anon. list constructor, list # passed to next map above } <DATA>; # Read data a line at a time into firs +t # map of transform above exit; __END__ shirt 47 cotton,white _menswear blouse 88 cotton,pink _womenswear saucepan 82 s/steel,20cm _household singlet 83 cotton,grey _menswear duff_line 56 lots of extra fields so match fails mixer 59 multi_function,dough_hook,blender,black _white_good +s skirt 39 tweed,brown _womenswear chefs_knife 11 french_style,8inch _household trousers 27 jean,blue _menswear pepper_mill 51 beech_wood,ceramic _household shirt 15 polyester,lemon _menswear shirt 47 cotton,yellow _menswear duff_line 45 more extra fields so duff again shirt 47 cotton,blue _menswear socks 76 wool,blue,pack_of_5 _menswear microwave 40 850W,turntable,white _white_good +s boxers 84 cotton,tartan _menswear trousers 47 cotton,blue _menswear skirt 56 velvet,black_BUT_DUFF_COS_TRAILING_SPACE _womenswear + duff_line no_numbers_and_only_one_further_field t-shirt 29 nylon,black _menswear television 83 wide_screen,black _white_good +s tie 39 silk,blue _menswear butchers_block 45 beech _household deep_fat_fryer 27 white,non_stick _white_good +s blouse 55 silk,bronze_green _womenswear
      Running the script with arguments of menswear and household gives:-
      boxers 84 cotton,tartan _menswear singlet 83 cotton,grey _menswear socks 76 wool,blue,pack_of_5 _menswear shirt 47 cotton,blue _menswear shirt 47 cotton,white _menswear shirt 47 cotton,yellow _menswear trousers 47 cotton,blue _menswear tie 39 silk,blue _menswear t-shirt 29 nylon,black _menswear trousers 27 jean,blue _menswear shirt 15 polyester,lemon _menswear saucepan 82 s/steel,20cm _household pepper_mill 51 beech_wood,ceramic _household butchers_block 45 beech _household chefs_knife 11 french_style,8inch _household duff_line 56 lots of extra fields so match fails mixer 59 multi_function,dough_hook,blender,black _white_good +s microwave 40 850W,turntable,white _white_good +s duff_line 45 more extra fields so duff again duff_line no_numbers_and_only_one_further_field skirt 56 velvet,black_BUT_DUFF_COS_TRAILING_SPACE _womenswear + television 83 wide_screen,black _white_good +s skirt 39 tweed,brown _womenswear blouse 88 cotton,pink _womenswear deep_fat_fryer 27 white,non_stick _white_good +s blouse 55 silk,bronze_green _womenswear
      I formatted the data to make it obvious whether all aspects of the sort were working (which I think they are). I don't know if there would be a performance hit if sorting large amounts of data but this is a technique I have used a lot when sorting a few hundreds of items.
      @sort = map{...} sort{...} map{...} map{...} @list;
      or
      @sort = map{...} sort{...} grep{...} map{...} @list;
      or even
      @sort = map{...} sort{...} map{...} grep{...} map{...} @list;
      Cheers,

      JohnGG

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://71212]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2024-04-25 04:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found