Moron has asked for the wisdom of the Perl Monks concerning the following question:

Hi there,

I am normally able to figure out what data structure to use in perl, but this one puzzles me:

I am testing a batch process (unix/sybase) with a set of combinations of parameters and recording the duration against the combination. The data dumped out for a particular run of the batch process is (actually date is done post-processing but pre-reporting, but that's not a relevant detail):

Date: YYYYMMDD (time not needed - it takes days to complete) Count: \d+ (no. of records in batch for test run) Type: \w+ (record type) State: \w+ (record state of batch NEW OLD or MIX) Client: \w+ (abbreviated) Quality: \d (0=valid 1=invalid records) Program: \S+ (for different versions of the batch program) Duration: \d+ (run time in seconds)

Each run of the test wrapper dumps the above as a simple hash using Data::Dumper which I can easily read in to a reporting program. For example an array of hash was my first attempt. (Perl everywhere except some older components of the batch process).

I plan to convert these into a CSV format (one way or another, not relevant how) in the form:

,date1,date2,... "COUNT",c1,,, "TYPE",t1,,, "STATE",s1,,, "CLIENT",k1,,, "QUALITY",q1,,, "PROGRAM",p1,,, "DURATION",d11,d12,... ,,,,,,,,,,,, "COUNT",c2,,, "TYPE",t2,,, etc. "DURATION",d21,d22,... etc.
Where each combination (c1,t1,s1,k1,q1,d1,p1) is unique vertically in the csv, with the duplicates (runs of same combination on different dates with potentially - in fact actually - different durations) being tabulated horizontally under the relevant date column.

And now the rub. As an array of hash, I see no way to gather the unique combinations for each date. As a hash of array, or hash of hash, same problem for different reasons. I next tried using two hashes, one by date and one by combo but still none of these suggest how I can get it in the right form prior to conversion into csv format. Is there a matrix trick I can use?

Any assistance (that focuses on the required data structure and/or its processing) would be much appreciated.

Latest Update: It might help focus on the problem by expressing it like this: how to convert an array of hash in to an array of array (using only perl, perhaps a module can be justified in response but only if it is strictly structure-related) that, ((for clarification of its structure only)), maps to an excel table, where unique combinations the hash elements are the rows, one of the elements "date" forms the columns and another "duration" (of a run on that "date" versus each unique combination of the remaining hash fields) forms the rank-and-file cells of the table.

Update: The exact table format represented by the above csv can be changed if there is an easier way or easier data structure implied that still preserves the ability to show unique run combinations vertically and run dates horizontally or even a complete change that presents the information at least as usefully and carries less difficulty with data structures and their processing. Also updated to clarify that duration is multiple per unique combination of the other parameters. Sadly, this update invited too much freedom in interpreting the actual problem I have.

Sincerely,

-M

Free your mind

Replies are listed 'Best First'.
Re: pre-organisation of test results prior to conversion to CSV
by davidrw (Prior) on May 02, 2006 at 13:20 UTC
    It isn't really csv if it's "vertical" .. I think you should definitely go "horizontal" with it so that it's just a standard "table".
    date,count,type,state,client,quality,program,duration d1,c1,t1,s1,k1,q1,p1,d1 d2,c2,t2,s2,k2,q2,p2,d2 ...
    Now, this is easily parsable using Text::CSV or similar. But actually, i'd recommend DBD::CSV and use the power of SQL for your needs.
    # get dates w/dups, and the dup count SELECT `date`, count(*) as num_dups FROM foo GROUP BY `date` HAVING co +unt(*) > 1 # get everything for one date SELECT * FROM foo WHERE `date` = ? # gather the unique combinations for each date. my %dates; my $rows = $dbh->selectall_arrayref("select * from foo", {Slice=>{}}, +); # (might not want to suck it all in at once, though) foreach my $row (@$rows){ push @{ $dates{ $row->{date} } }, $row; } print Dumper \%dates;
    (i also recommend changing your column names so that they're not sql keywords like 'count' or 'date')
      Unfortunately this all manages to miss the OP completely. I will now make an update to help focus on the problem.

      -M

      Free your mind

Re: pre-organisation of test results prior to conversion to CSV
by Moron (Curate) on May 02, 2006 at 14:56 UTC
    I just had an idea which I will now see if it can work technically, to whit, using the following somewhat arcane hybrid:
    # pseudocode { hash-reference => { date => scalar, duration => scalar } }
    where each hash-reference refers to what could also be an array of the remaining fields and their values (i.e. the unique combinations of the other data, with the field names only for convenience). - later...

    Update: if it weren't for the fact that this allows duplicate logical keys in the upper hash ( hash references being to possibly non-unique hashes) it would have worked .... but this is the closest I got so far. I could of course encode the referenced hash into a hash key, e.g.

    #pseudo-code { join( $;, %hash ) => { date => scalar, duration => scalar } }
    But it just gets messier... for example the %hash has to be guaranteed to have its keys in a consistent order when it gets co-erced into a hash key and if that happened not to be a problem, it would still be poor practice to rely on it.

    More update... Well since so far all I get is a down-vote and no further bright ideas, I'll just go with this inelegant solution and hack the embedded hash into being ordered by key.

    see y'all in about six months when I next have a puzzler of this calibre. Sadly, I don't have time to participate properly in this site and my inclination is hardly bolstered under the circumstances.

    -M

    Free your mind