Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm not smooth enough to get this right and could use some help. Here's an example of some text data that I need to parse:

Clock: blah/1

Pin: blah/2

Net: blah/3

The clock global skew: 0.100

The longest path delay: 1.000

The shortest path delay: 1.100

The above 6 lines of text are repeated several hundred times with different Clock/Pin/Net names and assosiated values (next 3 lines).

All I want to do is reorder the file by skew: that is, keep the six lines for each clock together, but change the order of the whole file to list small skew -> largest skew.

I also want to add an option to sort by longest path delay or skew, but one thing at a time...need to get the skew ordered first. :-)

Many thanks in advance!

Regards,

-Chris

Replies are listed 'Best First'.
Re: Sorting question...
by dragonchild (Archbishop) on Jan 16, 2004 at 03:27 UTC
    This is all in your datastructure. I would envision that you would convert the six lines above to something that would look similar to:
    $VAR1 = { clock => 'blah/1', pin => 'blah/2', net => 'blah/3', clock_skew => '0.100', longest_delay => '1.000', shortest_delay => '1.100', };

    If you do that, and have an array of these hashrefs, the sorting is very simple.

    my @values = generate_data_from_some_datasource($data_source); # @values is now an array of hashrefs, each looking like the one above my @sorted_values = sort { $a->{clock} cmp $b->{clock} } @values;

    The right data structure produces the easiest code. Now, instead of 'clock', you could, if a fellow was motivated, have an option that could be passed in. The trick is to make sure you're doing the right sort. cmp is alphanumeric sorting. <=> is numeric sorting. (A fellow, if interested, might look at creating objects that use overloaded sorting comparators, but that's probably overkill here.)

    ------
    We are the carpenters and bricklayers of the Information Age.

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: Sorting question...
by davido (Cardinal) on Jan 16, 2004 at 03:36 UTC
    You haven't defined what is your input record separator. I'll assume that it is "\n\n", and that the individual "lines" within each record are delimited by a single "\n";

    I will also assume that all of the "keys" and "values" are delimited by a colon, and that colon cannot appear as a key nor a value. These are a lot of assumptions, but you didn't really define such things in your question.

    my @records; { local $/ = "\n\n"; open IN, "<file.dat" or die "Cannot open data file.\n$!"; while ( my $record = <IN> ) { my @kv_pairs = split /\n/, $record; push @records, [@kv_pairs]; } close IN; } my @sorted_recs = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, (split /:\s*/, $_->[3])[1] ] } @records;

    Ok, here's the walk-through...

    First, set the input record separator to "\n\n" so that you read in entire blank-line-delimited records together. split the key/value pairs on "newline" characters, and push them into an array of arrayrefs (a list of lists, see perllol).

    Next, perform a Schwartzian Transform. Use the 4th element of each anonymous arrayref held in @records. The forth element ([3]) contains "The global clock skew:...". Split that element on ":", and use the second half (the value) as what gets put into the ST container for comparison purposes within the sort.

    Finally, the ST returns your original datastructure in sorted order.

    If you wish to sort on multiple criteria, your ST will have to be just a little more complicated, and then your sort routine will have to have some "or" logical short-circuits to fall through to the next level if higher levels evaluate to equality during the sort.

    I hope this helps. Kinda confusing, I know. But if you can define the spec a little better I might be able to help with a more robust (and probably slightly less complex) solution.

    Note, the reason I used an array of arrays, instead of an array of hashes to store your datastructure, was because I assumed that you wanted to keep each record's lines in their original order. Since normal hashes don't have any concept of order, dumping your K/V pairs into a hash would have lost their original order within each record. If order is unimportant, an array of hashes could serve to reduce the complexity of the transform.

    I also used the <=> comparison operator instead of cmp, because it looked to me like you're trying to sort on a numeric value.

    Update: Fixed extraneous } character at the end of the code snippet. Sorry about that, hope this gets you going in the right direction.


    Dave

      Hi, Dave,
      
      Thanks so much for your excellent suggestion and explanation.
      
      A lot of the syntax in your example is new to me, so I am
      having a little trouble following everything.  Actually, it
      is the "my @sorted_recs =" routine that I'm struggling with.
      There is a missing "{" somewhere that I can't figure out
      where to insert...?
      
      All of you assumptions about the data source format are 
      correct.  Here is an exact snippet from the file I need to parse..

      Clock: pcm_pclk_i Pin: pad_ring_i\/pcm_pclk_pad/Y Net: pcm_pclk_i Operating Condition = worst The clock global skew = 0.113 The longest path delay = 2.663 The shortest path delay = 2.550 The longest path delay end pin: co_8port_i/I0/CLKB The shortest path delay end pin: co_8port_i/pcm_tsc_n_pclk_reg/CK Clock: ltr_clk Pin: clkgate_i/peaz_gate8/S_5/Y Net: ltr_clk Operating Condition = worst The clock global skew = 0.171 The longest path delay = 3.723 The shortest path delay = 3.552 The longest path delay end pin: co_8port_i/LOCKUP/GN The shortest path delay end pin: co_8port_i/ntr_frame_sample_c2_reg6/CK

      Followed by several hundred more clocks. So the semicolon isn't the only qualifier: the "=" is necessary as well. (I goofed in my original post...sorry about that.) That's everything. Where does that missing "{" go? Thanks, -Chris

        In order to understand my example you need to understand several concepts. It's somewhat idiomatic. I'll try to walk through it again a little more descriptively...

        First, the first "paragraph" of code:

        my @records; # Declare an array to hold records from the input file. { # Start a new lexical block to limit the scope # of the next line. local $/ = "\n\n"; # Set the input record separator to # be a blank line. This allows you to read # records in, delimited by blank lines. open IN, "<file.dat" or die "Cannot open data file.\n$!"; # Open the input file for reading. Die if an error occur +s. while ( my $record = <IN> ) { # Read the file in record by record, into $record. my @kv_pairs = split /\n/, $record; # Split each record into its six lines, and store # each line as an array element. push @records, [@kv_pairs]; # Create an anonymous array that is a copy of @kv_pairs. # Push the anonymous array containing a complete set # of keys/values for a given record into the @records arr +ay # as an array of arrays. } # End the while loop. close IN; # Close the infile, we're done with it. } # Close the lexical block that constrains the # effects of the "local $/ = ....." definition.

        Ok, that's the paragraph that builds up a datastructure containing a list of records from your input file. Now we need to sort them. To do so, we use a Schwartzian Transform. The transform puts your original data structure into a container where each element is paired up with the one thing you wish to sort by. The idea is that you sort by some criteria, and then strip away the sort criteria before handing the original datastructure back, in sorted order. I guess I'm not all that good at explaining it. When reading the commented code below, start from the LAST map, and read backward. Or in other words, the last occurrence of map is feeding data to sort, which is feeding data to the first occurrence of map. Here's the fully commented code:

        my @sorted_recs = map { $_->[0] } # This first map executes last, after the sort is done. # it strips away the datastructure container along with the # sort criteria information, leaving only the original # datastructure, in sorted order. sort { $a->[1] <=> $b->[1] } # Sort executes after the second map is done. Sort is being told # to sort on the floating point number being used as the # sort criteria. map { [ $_, (split /:\s*/, $_->[3])[1] ] } # The final map executes first. It creates a container # datastructure. Each element contains an array ref to a two # element array. The first element is your original record. # The second element contains only the floating point # number you're trying to sort on. @records;

        I really hope this helps. If you ask specific questions as to which part of it you are having trouble understanding I'm happy to help. The whole thing would have been Greek to me before I really dug in and started reading everything I could find about Perl (and experimenting a lot, and participating here). Good luck. It's a fun ride.


        Dave

        It's not a missing "{", it's an extraneous "}" at the end.

        The PerlMonk tr/// Advocate
Re: Sorting question...
by BrowserUk (Patriarch) on Jan 16, 2004 at 04:49 UTC

    This makes lots of assumptions about your data format which wasn't very clear.

    Ie. no extra blank lines between the records (change 1..6 to 1..7 if there are) and that all the skew values are of the form 'n.nnnn' (adjust the regex and/or use sprintf in the $_ = $1 . $_ otherwise).

    It uses the Advanced Sorting - GRT - Guttman Rosler Transform algorithm.

    Usage: script < unsorted > sorted

    #! perl -slw use strict; open my $fh, '<', $ARGV[ 0 ] or die "Couldn't open $ARGV[ 0 ]: $!"; my @records; push @records, join '', map{ scalar <$fh> } 1 .. 6 while not eof( $fh ); print stderr 'Record count: ', scalar @records; my @sorted = map { substr $_, 5 } sort map{ m[skew: ([\d.]{5})]; $_ = $1 . $_ } @records; chomp @sorted; print for @sorted;

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Timing (and a little luck) are everything!

A reply falls below the community's threshold of quality. You may see it by logging in.