ronald has asked for the wisdom of the Perl Monks concerning the following question:

I have been trying to add an OO interface to Sort::ArbBiLex, a handy module for doing lexicographic sorts for languages that don't have their own locales (e.g. endangered minority languages). In the process I ran into a problem with nesting the Schwartzian transform inside a sort, but only when the ST is in a different package than the outer sort. The following minimal code illustrates the problem:
#!/usr/bin/perl use strict; use warnings; my @pair = ( [ qw(a b) ], [ qw(b a) ], [ qw(c b) ], [ qw(b c) ], ); my @triplet = ( [ qw(a b c) ], [ qw(b a c) ], [ qw(c b a) ], [ qw(b c a) ], ); print "Individual calls to mycmp()...\n"; foreach (@pair) { print "Same package: "; print $_->[0], ' ', $_->[1], ' => '; print $_->[0], ' ', $_->[1], "\n" if mycmp(@$_) == -1; print $_->[1], ' ', $_->[0], "\n" if mycmp(@$_) == 1; print $_->[1], ' ', $_->[0], "\n" if mycmp(@$_) == 0; print "Different package: "; print $_->[0], ' ', $_->[1], ' => '; print $_->[0], ' ', $_->[1], "\n" if Diffpackage::mycmp(@$_) == -1; print $_->[1], ' ', $_->[0], "\n" if Diffpackage::mycmp(@$_) == 1; print $_->[1], ' ', $_->[0], "\n" if Diffpackage::mycmp(@$_) == 0; } print "\nMultiple calls to mycmp() in sort...\n"; foreach (@triplet) { print "Same package: "; print $_->[0], ' ', $_->[1], ' ', $_->[2], ' => '; print (join ' ', sort { mycmp($a, $b) } @$_ ); print "\n"; print "Different package: "; print $_->[0], ' ', $_->[1], ' ', $_->[2], ' => '; print (join ' ', sort { Diffpackage::mycmp($a, $b) } @$_ ); print "\n\n"; } exit; sub mycmp { my ($first, $second) = @_; return 1 if $first ne ( mysort($first, $second) ); return -1 if $first eq ( mysort($second, $first) ); return 0; } sub mysort { my @ans = map { $_->[0] } sort { $a->[0] cmp $b->[0] } # $a and $b are always defined map { [ $_ ] } @_; return $ans[0]; } package Diffpackage; # this is the same as previous mycmp sub mycmp { my ($first, $second) = @_; return 1 if $first ne ( mysort($first, $second) ); return -1 if $first eq ( mysort($second, $first) ); return 0; } # this is the same as previous mysort sub mysort { my @ans = map { $_->[0] } sort { $a->[0] cmp $b->[0] } # this is where $a and $b can be un +defined map { [ $_ ] } @_; return $ans[0]; }
When I run this code with perl v5.8.2 I get the following results. The mycmp function works with pairs of inputs, regardless of the package mycmp lives in.

When multiple calls to mycmp are required to sort three inputs, the nested ST appears to work fine if mysort is in the same package as the outer sort. Diffpackage::mysort generates lots of errors, though the first two characters sort properly:
Individual calls to mycmp()... Same package: a b => a b Different package: a b => a b Same package: b a => a b Different package: b a => a b Same package: c b => b c Different package: c b => b c Same package: b c => b c Different package: b c => b c Multiple calls to mycmp() in sort... Same package: a b c => a b c Use of uninitialized value in string comparison (cmp) at ./testsort3.p +l line 77. (Snip repeated error) Different package: a b c => a b c Same package: b a c => a b c Use of uninitialized value in string comparison (cmp) at ./testsort3.p +l line 77. (Snip repeated error) Different package: b a c => a b c <== first two inputs sort correctly Same package: c b a => a b c Use of uninitialized value in string comparison (cmp) at ./testsort3.p +l line 77. (Snip repeated error) Different package: c b a => b c a <== first two inputs sort correctly Same package: b c a => a b c Use of uninitialized value in string comparison (cmp) at ./testsort3.p +l line 77. (Snip repeated error) Different package: b c a => b c a
Looking at this in the debugger, I notice that when Diffpackage::mysort is nested in the outer sort, $a and $b are defined in the sort line of Diffpackage::mysort the first time that line is evaluated but not on subsequent evaluations of that line. This problem doesn't happen when mysort is in the same package as the outer sort.

Is this a known problem? Can anyone shed some light?

ronald

Replies are listed 'Best First'.
•Re: Problem with Schwartzian Transform nested in another sort from a different package
by merlyn (Sage) on Apr 15, 2004 at 19:42 UTC
    The fast access to $a and $b are provided by setting up local variables in the package in which the sort is invoked. You'll have to find a way to correlate the package in which the sort subroutine is compiled with the package in which the sort operator is being executed.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      Or get slower lexical variables for $a and $b by providing a prototyping your comparison function and creating my variables to store them in.

      From perlfunc(1):

      sort SUBNAME LIST
      sort BLOCK LIST
      sort LIST
      
        Sorts the LIST and returns the sorted list value.
        ...
        If the subroutine's prototype is "($$)", the ele­
        ments to be compared are passed by reference in
        "@_", as for a normal subroutine.  This is slower
        than unprototyped subroutines, where the elements
        to be compared are passed into the subroutine as
        the package global variables $a and $b (see exam­
        ple below).
      
      So my results are the expected behavior?

      I had (perhaps naively) believed the inner sort in Diffpackage::mysort would set up its own localized $a and $b in its package, and the outer sort would have its own $a and $b in main.

      I guess I don't understand the magic behind $a and $b very well. I am attempting to implement your suggestion about correlating package names, though I'm not quite sure I understand what you mean.

      Many thanks for your reply.
        If you're confused about variables and packages, it sounds like you need to read Coping with Scoping.

        Since sort was there in Perl 4, the mechanism that it uses by default ($a, $b) accesses the package variable of that name in the current package.

        See the output of the following two one-liners:

        perl -MO=Xref -e "my @array = sort { $a <=> $b } ( 2, 8, 3 );"

        perl -MO=Xref = "local ( $a, $b ); my @array = sort { $a <=> $b } ( 2, + 8, 3 );"

        The first example results in crossreference output that essentially makes no mention of $a and $b.

        The second example results in crossreference output that includes $a and $b under the package main namespace.


        Dave

Re: Problem with Schwartzian Transform nested in another sort from a different package
by tilly (Archbishop) on Apr 15, 2004 at 21:18 UTC
    This is one of the very few good uses for prototypes that I know of. (I forget who pointed this one out to me, but it was on perlmonks.)

    Just have your sort functions declared with a prototype of ($$) and it will pass data in through @_ and never use $a or $b. (Avoiding confusion about what package $a and $b are to be found in.) That is write:

    sub mycmp ($$) { my ($first, $second) = @_; return 1 if $first ne ( mysort($first, $second) ); return -1 if $first eq ( mysort($second, $first) ); return 0; } # and elsewhere ... sort {&Whatever::Package::mycmp} map ... # or put in in a variable my $cmp = \&Whatever::Package::mycmp; ... sort $cmp map ... # of if you don't mind a speed hit and don't trust the user my $cmp = sub ($$) {Whatever::Package::mycmp(@_)}; ... sort $cmp map ...
    (Should work in 5.6 on up.)

    Incidentally coding a cmp function in terms of a sort function strikes me as both obscure and inefficient.

      Thanks for the tip on prototypes.

      Yes, this approach is obscure and inefficient. When the OO interface I put on the module (not mine) ended up failing, it's what I found, and I got to wondering...

      Once I've mastered this topic, I'll probably implement a different approach altogether.