flexvault has asked for the wisdom of the Perl Monks concerning the following question:
Dear Monks,
I came across "What's Wrong with sort and How to Fix It" article by Tom Christiansen dated August 31, 2011 6:00 AM on Perl.com which stated:
Which looked really cool!If you have code that purports to sort text that looks like this: @sorted_lines = sort @lines; Then all you have to get a dictionary sort is write instead: use Unicode::Collate; @sorted_lines = Unicode::Collate::->new->sort(@lines);
Now I have a script that verifies to me how string comparison operators work:
When you run the code it does this:use strict; use warnings; use Benchmark qw(:all); for my $no ( 0 .. 255 ) { my ( $lt, $eq, $gt ); if ( $no == 0 ) { $lt = chr($no); if ( ! ( "" lt $lt ) ) { die "\"\" not lt \\0\n"; } if ( "" eq $lt ) { die "\"\" eq \\0\n"; } if ( "" gt $lt ) { die "\"\" gt \\0\n"; } } elsif ( $no == 255 ) { $lt = chr($no-1); $eq = chr($no); if ( ! ( $lt lt $eq ) ) { die "$no not lt \n"; } if ( chr($no) ne $eq ) { die "$no not eq \n"; } } else { $lt = chr($no-1); $eq = chr($no); $gt = chr($no+1); if ( ! ( $lt lt $eq ) ) { die "$no not lt \n"; } if ( chr($no) ne $eq ) { die "$no not eq \n"; } if ( ! ( $gt gt $eq ) ) { die "$gt not gt \n"; } } } print "Great!!!\n"; exit;
> time perl test_ltgteq.plx Great!!! real 0m0.046s user 0m0.036s sys 0m0.008s
So in order to see how "Unicode::Collate" would work I modified the code to the following:
And when you execute it, you get the following:use strict; use warnings; use Unicode::Collate; my $Collator = Unicode::Collate->new(); my $Print = ""; my $something = ""; ## Initialize if ( $Collator->eq( $something, "" ) ) { $Print .= "\$somethin +g eq \"\"\n"; } else { $Print .= "\$somethi +ng ne \"\"\n"; } for my $no ( 0 .. 255 ) { my ( $lt, $eq, $gt ); if ( $no == 0 ) { $lt = chr($no); if ( ! ( $Collator->lt( "", $lt ) ) ) { $Print .= "0: \"\" + not lt \\0\n"; } if ( $Collator->eq( "", $lt ) ) { $Print .= "0: \"\" e +q \\0\n"; } if ( $Collator->gt( "", $lt ) ) { $Print .= "0: \"\" g +t \\0\n"; } } elsif ( $no == 255 ) { $lt = chr($no-1); $eq = chr($no); if ( ! ( $Collator->lt( $lt, $eq ) ) ) { $Print .= "255 no +t lt $eq\n"; } if ( $Collator->ne( chr($no), $eq ) ) { $Print .= "255 not + eq $eq\n"; } } else { $lt = chr($no-1); $eq = chr($no); $gt = chr($no+1); if ( ! ( $Collator->lt( $lt, $eq ) ) ) { $Print .= ord($lt +)." not lt ".ord($eq)."\n"; } if ( $Collator->ne(chr($no), $eq ) ) { $Print .= "$no not +eq ".ord($eq)."\n"; } if ( ! ( $Collator->gt( $gt, $eq ) ) ) { $Print .= ord($gt +)." not gt ".ord($eq)."\n"; } } } $Print .= "\nGreat!!!\n\n"; print $Print; exit;
> time perl test_ltgteq2.plx $something eq "" 0: "" not lt \0 0: "" eq \0 0 not lt 1 2 not gt 1 1 not lt 2 3 not gt 2 2 not lt 3 4 not gt 3 3 not lt 4 5 not gt 4 4 not lt 5 6 not gt 5 5 not lt 6 7 not gt 6 6 not lt 7 8 not gt 7 7 not lt 8 14 not gt 13 13 not lt 14 15 not gt 14 14 not lt 15 16 not gt 15 15 not lt 16 17 not gt 16 16 not lt 17 18 not gt 17 17 not lt 18 19 not gt 18 18 not lt 19 20 not gt 19 19 not lt 20 21 not gt 20 20 not lt 21 22 not gt 21 21 not lt 22 23 not gt 22 22 not lt 23 24 not gt 23 23 not lt 24 25 not gt 24 24 not lt 25 26 not gt 25 25 not lt 26 27 not gt 26 26 not lt 27 28 not gt 27 27 not lt 28 29 not gt 28 28 not lt 29 30 not gt 29 29 not lt 30 31 not gt 30 30 not lt 31 37 not gt 36 36 not lt 37 38 not gt 37 37 not lt 38 39 not gt 38 38 not lt 39 44 not gt 43 43 not lt 44 45 not gt 44 44 not lt 45 58 not gt 57 57 not lt 58 59 not gt 58 58 not lt 59 63 not gt 62 62 not lt 63 91 not gt 90 90 not lt 91 93 not gt 92 92 not lt 93 94 not gt 93 93 not lt 94 96 not gt 95 95 not lt 96 123 not gt 122 122 not lt 123 125 not gt 124 124 not lt 125 127 not gt 126 126 not lt 127 128 not gt 127 127 not lt 128 129 not gt 128 128 not lt 129 130 not gt 129 129 not lt 130 131 not gt 130 130 not lt 131 132 not gt 131 131 not lt 132 134 not gt 133 133 not lt 134 135 not gt 134 134 not lt 135 136 not gt 135 135 not lt 136 137 not gt 136 136 not lt 137 138 not gt 137 137 not lt 138 139 not gt 138 138 not lt 139 140 not gt 139 139 not lt 140 141 not gt 140 140 not lt 141 142 not gt 141 141 not lt 142 143 not gt 142 142 not lt 143 144 not gt 143 143 not lt 144 145 not gt 144 144 not lt 145 146 not gt 145 145 not lt 146 147 not gt 146 146 not lt 147 148 not gt 147 147 not lt 148 149 not gt 148 148 not lt 149 150 not gt 149 149 not lt 150 151 not gt 150 150 not lt 151 152 not gt 151 151 not lt 152 153 not gt 152 152 not lt 153 154 not gt 153 153 not lt 154 155 not gt 154 154 not lt 155 156 not gt 155 155 not lt 156 157 not gt 156 156 not lt 157 158 not gt 157 157 not lt 158 159 not gt 158 158 not lt 159 164 not gt 163 163 not lt 164 166 not gt 165 165 not lt 166 167 not gt 166 166 not lt 167 168 not gt 167 167 not lt 168 171 not gt 170 170 not lt 171 173 not gt 172 172 not lt 173 175 not gt 174 174 not lt 175 180 not gt 179 179 not lt 180 182 not gt 181 181 not lt 182 183 not gt 182 182 not lt 183 184 not gt 183 183 not lt 184 187 not gt 186 186 not lt 187 189 not gt 188 188 not lt 189 191 not gt 190 190 not lt 191 193 not gt 192 192 not lt 193 196 not gt 195 195 not lt 196 197 not gt 196 196 not lt 197 201 not gt 200 200 not lt 201 205 not gt 204 204 not lt 205 208 not gt 207 207 not lt 208 211 not gt 210 210 not lt 211 214 not gt 213 213 not lt 214 215 not gt 214 214 not lt 215 218 not gt 217 217 not lt 218 223 not gt 222 222 not lt 223 224 not gt 223 223 not lt 224 225 not gt 224 224 not lt 225 228 not gt 227 227 not lt 228 229 not gt 228 228 not lt 229 233 not gt 232 232 not lt 233 237 not gt 236 236 not lt 237 240 not gt 239 239 not lt 240 243 not gt 242 242 not lt 243 246 not gt 245 245 not lt 246 247 not gt 246 246 not lt 247 250 not gt 249 249 not lt 250 255 not gt 254 255 not lt ÿ Great!!! real 0m0.686s user 0m0.672s sys 0m0.008s
I added the "$something" to verify what I was seeing. I also changed "die" to "print" since there were so many differences.
What did I expect? Not what I got!
What I had expected was to see changes to the sort order after chr(127) and not before. Should I have added a parameter to "my $Collator = Unicode::Collate->new();" to get the sort to act like the Perl sort, or is this the way it is?
Thank you for your comments.
"Well done is better than well said." - Benjamin Franklin
|
|---|