lukez has asked for the wisdom of the Perl Monks concerning the following question:

Hello great website. I am very new to all this and need to ask a question on how to achieve results. Thank you in advance for your help on this. I need to sort a 2 column file, Column 1 is a Group, the 2nd col has X,Y pairs. I hope I am explaining this right. Within each Group: 1. I need X's sorted ascending. 2. For each new X value toggle the sort direction (ascending then desending) <code>
starting data ====Ending data in new file
'aaa'   1,2             'aaa'   1,2
'aaa'   2,1             'aaa'   2,3
'aaa'   2,3             'aaa'   2,1
'aaa'   3,1             'aaa'   3,1
'aaa'   3,2             'aaa'   3,2
'aaa'   4,1             'aaa'   4,5
'aaa'   4,5             'aaa'   4,1
'bbb'   2,2             'bbb'   2,1
'bbb'   2,5             'bbb'   2,2
'bbb'   2,1             'bbb'   2,5
'bbb'   4,3             'bbb'   4,6
'bbb'   4,6             'bbb'   4,3
'bbb'   4,1             'bbb'   4,2
'bbb'   4,2             'bbb'   4,1
'ccc'   3,3             'ccc'   1,1
'ccc'   3,6             'ccc'   1,3
'ccc'   1,3             'ccc'   2,4
'ccc'   1,1             'ccc'   2,2
'ccc'   6,4             'ccc'   3,3
'ccc'   6,6             'ccc'   3,6
'ccc'   2,2             'ccc'   6,6
'ccc'   2,4             'ccc'   6,4

Replies are listed 'Best First'.
Re: Sort then conditionally sort
by ikegami (Patriarch) on Apr 08, 2009 at 20:48 UTC

    There's no way to know whether to sort Y ascending or descending before X is fully sorted, so you'll need to do multiple passes.

    use strict; use warnings; my @data; while (<DATA>) { chomp; push @data, [ split /\s+|,/ ]; } @data = sort { $a->[0] cmp $b->[0] || $a->[1] <=> $b->[1] } @data; my %order; my $last_f; my $last_x; for (@data) { if (!defined($last_f) || $last_f ne $_->[0]) { $order{$_->[0]}{$_->[1]} = +1; $last_f = $_->[0]; $last_x = $_->[1]; } if ($last_x ne $_->[1]) { $order{$_->[0]}{$_->[1]} = -$order{$last_f}{$last_x}; $last_x = $_->[1]; } } @data = sort { $a->[0] cmp $b->[0] || $a->[1] <=> $b->[1] || ( $a->[2] <=> $b->[2] ) * $order{$a->[0]}{$a->[1]} } @data; print("$_->[0] $_->[1],$_->[2]\n") for @data; __DATA__ ...

    Update: By sorting only one dimension at a time, we can avoid multiple passes. It actually makes the program simpler:

    use strict; use warnings; my %data; while (<DATA>) { chomp; my ($f,$x,$y) = split /\s+|,/; push @{ $data{$f}{$x} }, $y; } my @data; my $order = +1; for my $f (sort keys %data) { my $xs = $data{$f}; for my $x (sort { $a <=> $b } keys %$xs) { my $ys = $xs->{$x}; push @data, map [ $f, $x, $_ ], sort { $order * ( $a <=> $b ) } @$ys; $order *= -1; } } print("$_->[0] $_->[1],$_->[2]\n") for @data; __DATA__ ...

    In both case, the data was

    aaa 1,2 aaa 2,1 aaa 2,3 aaa 3,1 aaa 3,2 aaa 4,1 aaa 4,5 bbb 2,2 bbb 2,5 bbb 2,1 bbb 4,3 bbb 4,6 bbb 4,1 bbb 4,2 ccc 3,3 ccc 3,6 ccc 1,3 ccc 1,1 ccc 6,4 ccc 6,6 ccc 2,2 ccc 2,4

      Global symbol "%data" requires explicit package name


      Ive been trying to figure out the file in/out code, but I get the error above, before getting to where the file in code even starts. Is something missing or not in correct order?

        My program doesn't generate that error. You'd get that error if you used hash %data without declaring it.
      I want to thank all of you for your help on this . Hi ikegami, I think your (update code) will be easier for me to eventually understand than the other code versions. I am open to any explanation/comments 1 thing I need to mention is that the data will be in an external file. Do I need to change the way your code opens up data?
      That is this part

      while (<DATA>)
      to something like this???
      ========================================================
      open (my $IN, 'myfile.dat') or die "$!";
      my @data = <$IN>;
      close $IN;
      ========================================================
      I hope I am saying this correctly... I will also need to have all sorted data in a different file... I think something like
      open (my $OUT, ">", 'output.dat') or die "$!";
      then perhaps add $OUT to your print-output code
      print ("$_->[0] $_->1,$_->2\n") for @data;
      to
      print $OUT("$_->[0] $_->1,$_->2\n") for @data;
      would that work or am I out in left field? You guys a teriffic, thank you all again...

      by the way is perl the best method to do this? I am curious about why there are so many languages, if 3 or 4 can do it all....Not sure if that is true tho... thx so much everyone Luke

      pps Is there a way to know when I get a response from you guys, as in an email notice? peace!!!

        Why would think that reading a line from a file handle should be replaced with reading the entire file into an array and closing the file?

        By the way, my data wasn't exactly in the same format as yours. I thought the first column was actually a file name and not in the file itself. That means you'll need to adjust the input parsing and output format.

        would that work or am I out in left field?

        Yes, that's how you write to a file.

        I am curious about why there are so many languages, if 3 or 4 can do it all....

        Because no language does it all, or does it the same way.

Re: Sort then conditionally sort
by ig (Vicar) on Apr 08, 2009 at 20:50 UTC

    The sort function allows you to define your own sorting criteria, as an expression, a block of code or a subroutine.

    You can split your records into three fields with something like split(/[\s,]+/). Then all you have to do is decide how to compare the fields.

    You can use a lexical comparison (cmp) for the first field and a numeric comparison (<=>) for the second. These operators are described in perlop.

    The third field is more challenging and how to proceed depends on details that are not clear to me. You say "each new X value" but not the context in which the value is "new". Are you concerned with the order in which they appear in the input file or the order they appear after sorting the first two fields or something else?

      Hi Ig, thank you The data in file, is a lot larger and with longer group names between the ''s. this data file is to be only read in and the code will be used to sort it as described in my 1st note. The X-sorted/then cond Y-sorted version is then saved to a new file. The original file is untouched. The 'Groups' can remain in whatever original order they were in, or they can be sorted if that is easier. (doesnt matter). To recap the X's for each 'Group' are sorted Ascending, then for each new X value the direction (ascend/decend) of the Y sort is changed. My other question was the proper way to call in the Data file in PERL, and how to print to another file. I gave my CODE guesses on how to do this, in my other email note. How far off was I on my guesses? Can you fix or confirm the code I thought I would need to do? thanks
      Im sorry to bother you again but I just realized the 'Group' names may contain
      Alpha numeric characters 'aaa3' '5aaa2' '43bbb' etc. Sorry I didnt
      mention this before. Will this affect the type of code sort. By the way these
      groups dont need to be sorted, they can be left in the original order, only
      their XYs need sorting.

        I think you have answers to most of your questions from others by now but briefly...

        Ikegami's update looks good to me.

        There are many ways to do everything - a bit confusing in the beginning but good in the long run. To read your data from another file you can let some perl "magic" do it for you, perhaps something like the following:

        #!/usr/local/bin/perl use strict; use warnings; foreach my $line (<>) { print "$line"; }

        The above script will read every line of every file named on the command line or, if no files are named on the command line, will read from standard input (STDIN). It does nothing but print the contents of the file, but you can put anything you like inside the loop.

        Alternatively, and perhaps a bit less mysteriously, you can open the file explicity yourself. The following would do it:

        #!/usr/local/bin/perl use strict; use warnings; my $filename = shift; # get the filename from the command line open(my $fh, '<', $filename) or die "$filename: $!"; foreach my $line (<$fh>) { print "$line"; }

        You might read open and perlopentut for more on opening files for input and output.

        You may have realized that <DATA> is special: it reads the data in your program file that appears after a line containing "__DATA__" (without the quotes) or "__END__". This is convenient for test scripts and otherwise. You can read more about this in perldata.

        Having some numbers in the group names won't be a problem. If you use Ikegami's examples the group names are sorted lexically. It is easier to sort them so that all the records for a group come together.

Re: Sort then conditionally sort
by kyle (Abbot) on Apr 08, 2009 at 20:38 UTC
      Hi Kyle, thank you sorry i didnt have code, I am JUST learning and I learn by looking at code solutions AND READING faqs and books etc.
      I was confused about this part of your code;
      my $op_io = <<'OP_INPUT_AND_OUTPUT';
      and all the example before and after columns listed in between
      OP_INPUT_AND_OUTPUT ;
      My request for help had the columns on the left as an example of the input data file to be sorted... the 2 columns on the right are the sorted /cond sorted data that needs to go in a separate file. This confused me. thank you for taking the time to help me.

        The construct is called a "here-document", and you can find them documented in perlop. It's basically a way to include some large chunk of text as a value in your program.

        In this case, I used it to hold your example data. After setting $op_io to that value, I use split to cut it into individual lines, and I loop over those lines to pull the individual values out. When I'm done, I have your inputs and desired output.

        I did it that way so I wouldn't have to reformat what you posted. I just pasted it in and wrote some code to pull out what I wanted.