Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, Sorting is a perl weakness of mine.

I have a text file, typically

123|Pete Smith|4321|321G4
2134|Mike Malarky|93F821
83|Dave Waylaid|8374W

..which I have loaded into an array @lines.
Could anyone offer an elegant sort by the surname (Smith, Malarky, Waylaid here) in less than the reams of code I have just wasted an hour on?

TIA

Replies are listed 'Best First'.
Re: custom sorting
by Hue-Bond (Priest) on Jul 04, 2008 at 12:23 UTC

    For instance, with this:

    print for map { substr $_, 30 } sort map { sprintf "%-30s%s", (split / /, (split /\|/)[1])[1], $_ } <DATA>; __DATA__ 123|Pete Smith|4321|321G4 2134|Mike Malarky|93F821 83|Dave Waylaid|8374W

    Output:

    2134|Mike Malarky|93F821 123|Pete Smith|4321|321G4 83|Dave Waylaid|8374W

    The difficult part is that nested split. The inner one, (split /\|/)[1] selects the second field in your data. Then the result is fed to the outer, (split / /, inner)[1], which extracts the second word. That is mapped to a string like this: Malarky                       2134|Mike Malarky|93F821. This string is ASCIIbetically sorted and then, the right part of it is extracted with substr and returned to print. This technique (mapping to a string, sorting and extracting the previously mangled data) is called Guttman-Rosler transform, google for it.

    As a side note, I'd like to point out that the first line in your example data appears to have four fields separated with a pipe, while the rest have only three fields.

    --
    David Serrano

      That's a double-s in Rossler.

        According to google:

        • "gutman-rosler": 1 match.
        • "gutman-rossler": 3 matches.
        • "guttman-rosler": 1660 matches.
        • "guttman-rossler": 5 matches.

        --
        David Serrano

Re: custom sorting
by pc88mxer (Vicar) on Jul 04, 2008 at 12:38 UTC
    It's easy, just sort them by surname:
    @sorted = sort { surname($a) cmp surname($b) } @lines;
    The surname() function (something you have to write) just extracts the surname from a line. One straight-forward solution is:
    sub surname { my $line = shift; my @fields = split(/\|/, $line); my $fullname = $fields[1]; # extract full name my ($first, $last) = split(/ /, $fullname); return $last; }
    I.e. split the line on vertical bars, get the full name, split the full name on spaces and return the second element.

    There are more clever and efficient ways to write the surname() function, but I just want to demonstrate that you don't have to try to be clever to solve the problem.

    This approach extends to any sorting by any property. Just write a function to extract (or compute) that property from the things you are sorting.

    If you are concerned about efficiency, have a look at this write-up: A Fresh Look at Efficient Perl Sorting where topics such as the Orcish Maneuver, Schwartzian Transform, and packed-default sort are discussed.

Re: custom sorting
by jethro (Monsignor) on Jul 04, 2008 at 13:40 UTC
    As a side note, are you sure that names like 'Mary Beth Jenkins' or 'Sammy Davis Jr.' won't be in your text file?

    If not, you might take pc88mxers code and change the last split line to:

    my (@name) = split(/ /, $fullname); $last= pop @name; if ($last=~/(jr|sr)\.?/i) { $last= pop @name . ' ' . $last; }
    Now if you also might have double surnames (like 'Mary Wayward Jenkins') you are out of luck. Changing all "Double Surnames" to "Double-Surnames" is then your only hope. Or doing that with the first names and changing above code somewhat

Re: custom sorting
by poolpi (Hermit) on Jul 04, 2008 at 13:16 UTC

    With sort and a regexp:

    #!/usr/bin/perl -w use strict; my $re = qr/\| \w+ \s+ (\w+) \|/msx; print sort { my ($aa) = $a =~ $re; my ($bb) = $b =~ $re; $aa cmp $bb; } <DATA>; __DATA__ 123|Pete Smith|4321|321G4 2134|Mike Malarky|93F821 83|Dave Waylaid|8374W Output : 2134|Mike Malarky|93F821 123|Pete Smith|4321|321G4 83|Dave Waylaid|8374W


    hth,
    PooLpi

    'Ebry haffa hoe hab im tik a bush'. Jamaican proverb
Re: custom sorting
by ysth (Canon) on Jul 04, 2008 at 18:12 UTC
    In general, parsing out a surname from a full name is not possible to do perfectly, so it's really good to never store a full name where you might need the surname.
Re: custom sorting
by poolpi (Hermit) on Jul 04, 2008 at 13:13 UTC

    Sorry for the double post ): Upload very very slow...



    PooLpi

    'Ebry haffa hoe hab im tik a bush'. Jamaican proverb
Re: custom sorting
by Anonymous Monk on Jul 04, 2008 at 13:35 UTC
    Thanks guys. support here is as excellent as ever.