iamrobj has asked for the wisdom of the Perl Monks concerning the following question:

Hi all! I am sill learning Perl, and have just installed ActivePerl on my Windows machine.

I am trying to open a text file with multiple lines containing names and phone numbers. They are split by the pipe symbol "|".

I am trying to open the file and sort the file by the phone number (second part in the array), but still keep the name before the number...
Zoe|123-4567
Andrew|123-4568

Ideally, I want to open the file, read contents, sort, write them again, then close. Can I do this? Do I have to write to a new file? Here is what I am doing (and it is NOT working...)

my $whatever; my @sortedarray; open(SORTED, ">sorted.txt"); open(TOSORT, "tosort.txt"); while ($whatever = <TOSORT>) { chomp $whatever; my @sortedarray = split(/\|/, $whatever); #@sortedarray = sort($sortedarray[1]); print SORTED "$sortedarray[0]|$sortedarray[1]\n"; } close(TOSORT); close(SORTED); print "done!"; exit;
I know it must be VERY simple, but I am just confusing myself more!!! :(

Replies are listed 'Best First'.
Re: sorting arrays...
by dws (Chancellor) on Jul 12, 2003 at 21:58 UTC
    The usual approach to sorting is to read everything that you want to sort (e.g., into an array), sort it, and then save the result. You're reading the unsorted file a line at a time. You can sort that way--by merging each new line into it's correct position in some data structure, but that's more work, and you won't get the benefit of Perl's built-in sort.

    Try something like this:

    my @unsorted = <TOSORT>; my @sorted = sort byPhone @unsorted;
    This lets you isolate details for how records will be compared while sorted into a separate subroutine. (See perldoc -f sort for details). Then, you'll do something like
    sub byPhone { return (split(/\|/, $a))[1] cmp (split(/\|/, $b))[1]; }
    to extract and compare phone numbers. Exchange $a and $b if you want to reverse the order.

      Thanks, worked great!:)

      I see that if I change [1] to [0] I can sort by the first item in the array... what is there are 3, 4, or 5 items in the array? Can I keep adding to the byPhone sub? $c, $d etc?

        Can I keep adding to the byPhone sub? $c, $d etc?

        While sorting, $a and $b (and only those two) are special. They've been localized, and hold the two items being compared for sorting purposes. All is explained in perlfunc in the section on sort.

        Since you started with the ActiveState distribution, you'll have all of the core Perl docs available via Start | Applications | ActiveState Perl.

Re: sorting arrays...
by Abigail-II (Bishop) on Jul 12, 2003 at 22:34 UTC
    As a one liner:
    $ perl -e 'system q !sort -k 2 -t "|" -o file file!'

    or without perl:

    $ sort -k 2 -t "|" -o file file

    which will even do the "right thing" if the file is huge.

    Abigail

      $ perl -e 'system q !sort -k 2 -t "|" -o file file!'

      That works well on Unix, but not on Windows (at least not without cygwin). The fellow who posed the question is on Windows.

        Or some other toolkit that has sort. I don't understand why this is a problem. perl doesn't come standard with Windows either. And we never have a problem with Perl programs using modules that don't come standard with Perl. So, why always this whining "it doesn't work on Windows", when using a well known tool that has been ported to Windows eons ago?

        Software reuse is good, even if it isn't written in Perl.

        Abigail

        Er, actually Win2k (and not doubt NT also, I havent checked) does have sort provided. No cygwin required. I imagine some switch changes are required however.

        Although if by "Windows" you mean Win9x then of course it isn't provided. But then again nor is it provided on Playstation 2. Point being why should a game machine need to sort?


        ---
        demerphq

        <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...

        • Update:  
        While Win2k does have a sort command built in its no where near as powerful as the cygwin/*nix tool that Abigail-II mentions. So I agree that as far as NT goes, unless installing cygwin is an option his proposed solution is not real useful :-)


Re: sorting arrays...
by bobn (Chaplain) on Jul 12, 2003 at 22:08 UTC
    When you want to sort a list, other than by a straight lexical or numerical comparision of its elements, the classic approach is the Schawatzian Transform.
    # UNTESTED # after opening files @tosort = <TOSORT>; chomp @tosort; @sorted = map { $_->[1] } sort { $a->[0] cmp $b->[0] } map { [ (split('|', $_))[1] , $_ ] } @tosort; for (@sorted) { print SORTED "$_\n" }

    The map {} sort {} map {} is read from the bottom up.

    The bottom map creates a list of array references, one arrayref per element of @tosort. each arrayref has as it's first element the phone number portion of the element, and as its second element the original element of tosort. The return value of map is a list of these arrayrefs that is fed to the sort.

    The sort then sorts these arrayrefs based on comparsion of the first element of each, which is the phone number. The return value of the sort is a list of the now-sorted arrayrefs, which is fed to the top map.

    The top map then picks the original values of @tosort out of the arrayrefs. The return value of this map is a list that is put into @sorted.

    The Schwarzian transform does the needed sort, with the added advantage that, if the sort criteria is some expensive operation (in this case it was just the split), it is only oden once per element, whereas if it were in the sort, it would be done many more times.

    The Schwartzian Transform is named for merlyn, who also goes by the name of Randal Schwartz.

    --Bob Niederman, http://bob-n.com

Re: sorting arrays...
by artist (Parson) on Jul 12, 2003 at 21:52 UTC
    You are not sorting the data all in your code. For sorting you require to get all the data in memory and then sort. You can do by storing in a hash. Assuming your names appear only once in your data, try following code.
    my $whatever; my @sortedarray; open(SORTED, ">sorted.txt"); open(TOSORT, "tosort.txt"); my %hash; while ($whatever = <TOSORT>) { chomp $whatever; my($name,$number) = split(/\|/, $whatever); $hash{$name} = $number; } foreach my $name (sort keys %hash){ my $number = $hash{$name}; print "$name\|$number"; } close(TOSORT); close(SORTED);

    artist

      This way sorted the name and not the number?
        Sorry about my mistake in reading your question:
        Replace this:
        my %hash; while ($whatever = <TOSORT>) { chomp $whatever; my($name,$number) = split(/\|/, $whatever); $hash{$number} = $name; } foreach my $number (sort keys %hash){ my $name = $hash{$number}; print "$name\|$number"; }
        artist