CSV file reading and comparison

tsk1979 has asked for the wisdom of the Perl Monks concerning the following question:

Hi everybody, I am facing a problem which is leaving me stumped. First let me explain the easy part, which I could do and have setup. We have a set of testcases which run through various stages, lets say stage1 stage2 stage3 stage4 and so on. When I run my regression, all the specified testcases run through this setup, and I have a log file for the regression for each testcase run. I search and read the runtime and memory statistics and dump a csv file and dump it like this

TestcaseName, Stage1Mem, Stage1Time, Stage2Mem, Stage2Time......
Test1,44,45,43,45.....
Test2,7,2334,45,34....
.
.
[download]

All well and good, its working fine. Now I have to solve another problem, i.e. automatic flagging of performance degradation or spurt in performance. so I will have an existing csv file in the same format. I will dump new csv file and flag cases where difference in numbers exceeds (2 and 20%) The old csv file may not have some new testcases, so the diff log so created should have the statistics of new testcases added, as well as flag which testcases changed beyond tolerance limits. I am trying this approach. Create a hash for each performance criteria(eg stage3) and inside a hash create a hash for each testcase. i.e. netsted hashes. That way this code is generic, I do not have to worry about new stages coming in the future! However my hashes are a little weak, and I am getting a little stumped. I have googled for code on nested hash within hash etc., but am still stumped, any tips will be appreciated!

Comment on CSV file reading and comparison Download Code

Replies are listed 'Best First'.
Re: CSV file reading and comparison by tsk1979 (Scribe) on Feb 27, 2008 at 10:39 UTC
I made a ramshackle code, quick fix kindoff, will optimize it later, it works good! the only problem is the order, I want the order of the keys while printing to be the same as when the keys were read in! In this case it seems to be sorted by name. Any tips please! $goldcsv = @ARGV[0]; #$newcsv = @ARGV[1]; my %goldhash; my @keyarray; open GOLD, "$goldcsv"; my $i = 0; while (<GOLD>) { chomp; $i=$i+1; if ($i eq 1) { @temparray = split (",",$_); my $j=0; foreach $elem (@temparray) { $j= $j+1; next if ($j eq 1); push (@keyarray, $elem); } next; } @temparray = split(",",$_); $testcasename = @temparray[0]; foreach my $value (1..$#temparray) { $goldhash{$testcasename}{$keyarray[$value-1]} = $temparray[$va +lue]; } } for $testcase (keys %goldhash) { print "$testcase: "; for $value (keys %{ $goldhash{$testcase} }) { print "$value = $goldhash{$testcase}{$value} "; } print "\n"; } [download]	[reply] [d/l]
Re^2: CSV file reading and comparison by toolic (Bishop) on Feb 27, 2008 at 14:06 UTC
I want the order of the keys while printing to be the same as when the keys were read in! Tie::IxHash will do this for you. Any tips please! Yes, I have some more tips... Use the strictures to find other potential problems with your code: `use warnings; use strict;` [download] This would produce the following warnings: `Scalar value @ARGV[0] better written as $ARGV[0] at ... Scalar value @temparray[0] better written as $temparray[0] at ...` [download] It will then be necessary to declare all variables with my to get the code to compile again. Always check success when you open a file, and always close the file. I refactored your code: #!/usr/bin/env perl use warnings; use strict; use Tie::IxHash; tie my %goldhash, "Tie::IxHash"; my $goldcsv = shift; my @keyarray; my @temparray; open my $GOLD_FH, '<', $goldcsv or die "Can not open $goldcsv $!\n"; my $i = 0; while (<$GOLD_FH>) { chomp; $i++; if ($i eq 1) { @temparray = split /,/; my $j=0; for my $elem (@temparray) { $j++; next if ($j eq 1); push @keyarray, $elem; } next; } @temparray = split /,/; my $testcasename = $temparray[0]; for my $value (1 .. $#temparray) { $goldhash{$testcasename}{$keyarray[$value-1]} = $temparray[$va +lue]; } } close $GOLD_FH or die "Can not close $goldcsv $!\n"; for my $testcase (keys %goldhash) { print "$testcase: "; for my $value (keys %{ $goldhash{$testcase} }) { print "$value = $goldhash{$testcase}{$value} "; } print "\n"; } [download] Here is the output. Is this what you had in mind? `Test1: Stage1Mem = 44 Stage2Time = 45 Stage2Mem = 43 Stage1Time = +45 Test2: Stage1Mem = 7 Stage2Time = 34 Stage2Mem = 45 Stage1Time = 2 +334` [download]	[reply] [d/l] [select]
Thanks for the tips by tsk1979 (Scribe) on Mar 03, 2008 at 08:38 UTC
I was actually going to incorporate strict and warnings, this was just a quick hashup. I also liked the suggestion of using persdc, its a good document. I am really new with complex data structures, and thanks for all the help rendered.	[reply]
Re: CSV file reading and comparison by goibhniu (Hermit) on Feb 28, 2008 at 17:14 UTC
I've found perldsc to be immensely helpful at HoH, etc., syntax. #my sig used to say 'I humbly seek wisdom. '. Now it says: use strict; use warnings; I humbly seek wisdom.	[reply]