I know I'm five years late for this party but I came across this node when Googling for Guttman/Rosler transforms and thought the problem was interesting. Criteria #3 can be handled as part of a transform by recording whether the match succeeded or failed as part of the list emitted by the first map. I insert a second map before sorting to either transform string1/string2 into 0/1 for matching lines to aid sorting or, for non-matching lines, transform the undefs from the failed match into zero or "" to silence warnings from the sort.
#!/usr/local/bin/perl -w
#
use strict;
# Set up the two strings that we want to match and sort to the top
# of our list.
#
our $string1 = shift or die "No strings\n";
our $string2 = shift or die "Only one string\n";
# Print out our data lines sorted using a ST but with an extra map bef
+ore
# the sort to tweak the results of the regular expression match done i
+n
# first map.
#
print
map {$_->[0]} # map out original line for print
sort
{
$b->[1] <=> $a->[1] || # elem. 1 is success (1) or failure (0
+) of
# match so descending numeric puts g
+ood
# lines first
$a->[5] <=> $b->[5] || # elem. 5 is our string1/string2 ranki
+ng so
# ascending numeric gets string1s fi
+rst
$b->[3] <=> $a->[3] || # elem. 3 is two-digit number, highs f
+irst
$a->[2] cmp $b->[2] || # ascending alpha for string at beginn
+ing ...
$a->[4] cmp $b->[4] # ... and the one following the digits
}
map
{
# If match below succeeded replace elem. 5 of anon. list, which
+ is
# either string1 or string2, with a zero if string1 or a one if
# string2 so we can do a numeric ascending sort to get all the
# string1s and string2s in the right place.
#
if($_->[1])
{
splice @$_, 5, 1, $_->[5] eq $string1 ? 0 : 1;
}
# Else match did not succeed so replace elems. 2 thru 5, which
+will
# contain undef, with appropriate empty strings or zeros so tha
+t the
# sort does not complain about "Use of uninitialized value ..."
+ if
# warnings are switched on.
#
else
{
splice @$_, 2, 4, q(), 0, q(), 0;
}
# Pass anon. list out of map to sort above.
#
$_;
}
map
{
[ # anon. list constructor
$_, # elem. 0 is original line
/(?x) # do match with extended syntax
^ # anchor to beginning of line
(\S+)\s+ # capture first string, then space(s)
(\d\d)\s+ # capture the two digits, then space(s
+)
(\S+)\s+_ # capture string after digits, then sp
+ace(s)
# followed by underscore
($string1|$string2) # capture either string1 or string2
$ # anchor to end of line
/ ? 1 : 0, # elem. 1 is success or failure of mat
+ch
$1, $2, $3, $4 # elems. 2 thru 5 are our captives
] # close anon. list constructor, list
# passed to next map above
}
<DATA>; # Read data a line at a time into firs
+t
# map of transform above
exit;
__END__
shirt 47 cotton,white _menswear
blouse 88 cotton,pink _womenswear
saucepan 82 s/steel,20cm _household
singlet 83 cotton,grey _menswear
duff_line 56 lots of extra fields so match fails
mixer 59 multi_function,dough_hook,blender,black _white_good
+s
skirt 39 tweed,brown _womenswear
chefs_knife 11 french_style,8inch _household
trousers 27 jean,blue _menswear
pepper_mill 51 beech_wood,ceramic _household
shirt 15 polyester,lemon _menswear
shirt 47 cotton,yellow _menswear
duff_line 45 more extra fields so duff again
shirt 47 cotton,blue _menswear
socks 76 wool,blue,pack_of_5 _menswear
microwave 40 850W,turntable,white _white_good
+s
boxers 84 cotton,tartan _menswear
trousers 47 cotton,blue _menswear
skirt 56 velvet,black_BUT_DUFF_COS_TRAILING_SPACE _womenswear
+
duff_line no_numbers_and_only_one_further_field
t-shirt 29 nylon,black _menswear
television 83 wide_screen,black _white_good
+s
tie 39 silk,blue _menswear
butchers_block 45 beech _household
deep_fat_fryer 27 white,non_stick _white_good
+s
blouse 55 silk,bronze_green _womenswear
Running the script with arguments of menswear and household gives:-
boxers 84 cotton,tartan _menswear
singlet 83 cotton,grey _menswear
socks 76 wool,blue,pack_of_5 _menswear
shirt 47 cotton,blue _menswear
shirt 47 cotton,white _menswear
shirt 47 cotton,yellow _menswear
trousers 47 cotton,blue _menswear
tie 39 silk,blue _menswear
t-shirt 29 nylon,black _menswear
trousers 27 jean,blue _menswear
shirt 15 polyester,lemon _menswear
saucepan 82 s/steel,20cm _household
pepper_mill 51 beech_wood,ceramic _household
butchers_block 45 beech _household
chefs_knife 11 french_style,8inch _household
duff_line 56 lots of extra fields so match fails
mixer 59 multi_function,dough_hook,blender,black _white_good
+s
microwave 40 850W,turntable,white _white_good
+s
duff_line 45 more extra fields so duff again
duff_line no_numbers_and_only_one_further_field
skirt 56 velvet,black_BUT_DUFF_COS_TRAILING_SPACE _womenswear
+
television 83 wide_screen,black _white_good
+s
skirt 39 tweed,brown _womenswear
blouse 88 cotton,pink _womenswear
deep_fat_fryer 27 white,non_stick _white_good
+s
blouse 55 silk,bronze_green _womenswear
I formatted the data to make it obvious whether all aspects of the sort were working (which I think they are). I don't know if there would be a performance hit if sorting large amounts of data but this is a technique I have used a lot when sorting a few hundreds of items.
@sort = map{...}
sort{...}
map{...}
map{...}
@list;
or
@sort = map{...}
sort{...}
grep{...}
map{...}
@list;
or even
@sort = map{...}
sort{...}
map{...}
grep{...}
map{...}
@list;
Cheers,JohnGG |