in reply to Sorting question...

You haven't defined what is your input record separator. I'll assume that it is "\n\n", and that the individual "lines" within each record are delimited by a single "\n";

I will also assume that all of the "keys" and "values" are delimited by a colon, and that colon cannot appear as a key nor a value. These are a lot of assumptions, but you didn't really define such things in your question.

my @records; { local $/ = "\n\n"; open IN, "<file.dat" or die "Cannot open data file.\n$!"; while ( my $record = <IN> ) { my @kv_pairs = split /\n/, $record; push @records, [@kv_pairs]; } close IN; } my @sorted_recs = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, (split /:\s*/, $_->[3])[1] ] } @records;

Ok, here's the walk-through...

First, set the input record separator to "\n\n" so that you read in entire blank-line-delimited records together. split the key/value pairs on "newline" characters, and push them into an array of arrayrefs (a list of lists, see perllol).

Next, perform a Schwartzian Transform. Use the 4th element of each anonymous arrayref held in @records. The forth element ([3]) contains "The global clock skew:...". Split that element on ":", and use the second half (the value) as what gets put into the ST container for comparison purposes within the sort.

Finally, the ST returns your original datastructure in sorted order.

If you wish to sort on multiple criteria, your ST will have to be just a little more complicated, and then your sort routine will have to have some "or" logical short-circuits to fall through to the next level if higher levels evaluate to equality during the sort.

I hope this helps. Kinda confusing, I know. But if you can define the spec a little better I might be able to help with a more robust (and probably slightly less complex) solution.

Note, the reason I used an array of arrays, instead of an array of hashes to store your datastructure, was because I assumed that you wanted to keep each record's lines in their original order. Since normal hashes don't have any concept of order, dumping your K/V pairs into a hash would have lost their original order within each record. If order is unimportant, an array of hashes could serve to reduce the complexity of the transform.

I also used the <=> comparison operator instead of cmp, because it looked to me like you're trying to sort on a numeric value.

Update: Fixed extraneous } character at the end of the code snippet. Sorry about that, hope this gets you going in the right direction.


Dave

Replies are listed 'Best First'.
Re: Re: Sorting question...
by Anonymous Monk on Jan 16, 2004 at 19:43 UTC
    Hi, Dave,
    
    Thanks so much for your excellent suggestion and explanation.
    
    A lot of the syntax in your example is new to me, so I am
    having a little trouble following everything.  Actually, it
    is the "my @sorted_recs =" routine that I'm struggling with.
    There is a missing "{" somewhere that I can't figure out
    where to insert...?
    
    All of you assumptions about the data source format are 
    correct.  Here is an exact snippet from the file I need to parse..

    Clock: pcm_pclk_i Pin: pad_ring_i\/pcm_pclk_pad/Y Net: pcm_pclk_i Operating Condition = worst The clock global skew = 0.113 The longest path delay = 2.663 The shortest path delay = 2.550 The longest path delay end pin: co_8port_i/I0/CLKB The shortest path delay end pin: co_8port_i/pcm_tsc_n_pclk_reg/CK Clock: ltr_clk Pin: clkgate_i/peaz_gate8/S_5/Y Net: ltr_clk Operating Condition = worst The clock global skew = 0.171 The longest path delay = 3.723 The shortest path delay = 3.552 The longest path delay end pin: co_8port_i/LOCKUP/GN The shortest path delay end pin: co_8port_i/ntr_frame_sample_c2_reg6/CK

    Followed by several hundred more clocks. So the semicolon isn't the only qualifier: the "=" is necessary as well. (I goofed in my original post...sorry about that.) That's everything. Where does that missing "{" go? Thanks, -Chris

      In order to understand my example you need to understand several concepts. It's somewhat idiomatic. I'll try to walk through it again a little more descriptively...

      First, the first "paragraph" of code:

      my @records; # Declare an array to hold records from the input file. { # Start a new lexical block to limit the scope # of the next line. local $/ = "\n\n"; # Set the input record separator to # be a blank line. This allows you to read # records in, delimited by blank lines. open IN, "<file.dat" or die "Cannot open data file.\n$!"; # Open the input file for reading. Die if an error occur +s. while ( my $record = <IN> ) { # Read the file in record by record, into $record. my @kv_pairs = split /\n/, $record; # Split each record into its six lines, and store # each line as an array element. push @records, [@kv_pairs]; # Create an anonymous array that is a copy of @kv_pairs. # Push the anonymous array containing a complete set # of keys/values for a given record into the @records arr +ay # as an array of arrays. } # End the while loop. close IN; # Close the infile, we're done with it. } # Close the lexical block that constrains the # effects of the "local $/ = ....." definition.

      Ok, that's the paragraph that builds up a datastructure containing a list of records from your input file. Now we need to sort them. To do so, we use a Schwartzian Transform. The transform puts your original data structure into a container where each element is paired up with the one thing you wish to sort by. The idea is that you sort by some criteria, and then strip away the sort criteria before handing the original datastructure back, in sorted order. I guess I'm not all that good at explaining it. When reading the commented code below, start from the LAST map, and read backward. Or in other words, the last occurrence of map is feeding data to sort, which is feeding data to the first occurrence of map. Here's the fully commented code:

      my @sorted_recs = map { $_->[0] } # This first map executes last, after the sort is done. # it strips away the datastructure container along with the # sort criteria information, leaving only the original # datastructure, in sorted order. sort { $a->[1] <=> $b->[1] } # Sort executes after the second map is done. Sort is being told # to sort on the floating point number being used as the # sort criteria. map { [ $_, (split /:\s*/, $_->[3])[1] ] } # The final map executes first. It creates a container # datastructure. Each element contains an array ref to a two # element array. The first element is your original record. # The second element contains only the floating point # number you're trying to sort on. @records;

      I really hope this helps. If you ask specific questions as to which part of it you are having trouble understanding I'm happy to help. The whole thing would have been Greek to me before I really dug in and started reading everything I could find about Perl (and experimenting a lot, and participating here). Good luck. It's a fun ride.


      Dave

        Hi, Dave,

        (If there are a bunch of underlined numbers throughout this posting, I apologize--the numbers are supposed to be inside square brakets.)

        Okay, I'm just about there. I'm starting to get this now, and things are starting to work. I have two small questions left...

        1) In the line:

        map { [ $, (split /:\s*/, $_->[3])[1] ] }

        Does this indicate the 4th line of the record [3] is split by a ":" and the stuff in the [1] address of the split is what ultimately ends up in the [1] position of the map? <Whew!> So if I was splitting by just plain 'ol \s+ qualifiers and the 5th position was a number I needed, I'd only change the [1] to a [4], right?

        2) When I try to do a:

        print for @sorted_recs;

        all I get is a bunch of ARRAY(hex#) lines printed out. It looks like the records have been sorted, but the addresses are not revealing the original contents ($_->[0], in this case). I'm not sure how to correct that.

        Thanks again for your excellent guidance--I really appreciate it.

        Best Regards,
        -Chris

        Edited by Chady -- fixed the formatting.

      It's not a missing "{", it's an extraneous "}" at the end.

      The PerlMonk tr/// Advocate
        Good catch. I've fixed it in the original post, and documented the update accordingly.


        Dave