naChoZ has asked for the wisdom of the Perl Monks concerning the following question:
I'm wondering if there's a simple way I could shoehorn this feature in without a full rewrite. Basically I want to just throw in a quick calculation of changes between columns.
Here's a trimmed down example of what I'm doing:
Read a file of num to name mappings:
num|name 1|foo 12|bar 15|bar 18|baz 25|quux 27|quux 37|quuux 48|quuuux
Read any number of files (granted, it's rarely more than a handful) that look like this:
file1: acct|description|nums foo-001|foo one|1,12,27,37 foo-002|foo two|1,15,25,37 foo-003|foo three|1,18,25,37 foo-004|foo four|12,18,25,37 foo-005|foo five|12,15,25,27,37 foo-006|foo six|1,12,25,27,37,99 file2: acct|description|nums foo-001|foo one|1,12,15,27,37 foo-002|foo two|1,15,25,37 foo-003|foo three|1,18,25 foo-004|foo four|12,18,25,37 foo-005|foo five|12,15,25,27,37 foo-006|foo six|1,12,15,25,27,37,99 file3: acct|description|nums foo-001|foo one|1,12,15,18,27,37 foo-002|foo two|1,15,25,37 foo-003|foo three|1,18,25 foo-004|foo four|12,18,25,37 foo-005|foo five|12,15,25,27,37 foo-006|foo six|1,12,15,25,27,37,99
I iterate and increment values in a hashref as I go so it ends up looking like:
$hash => { file1 => { foo => 4, bar => 6, baz => 4, quux => 8, quuux => 6, quuuux => 0, }, file2 => { foo => 4, bar => 8, baz => 4, quux => 8, quuux => 5, quuuux => 0, }, file3 => { foo => 4, bar => 8, baz => 5, quux => 8, quuux => 5, quuuux => 0, }, }
And then simply I produce a report like this to make it easy to spot the differences:
name file1 file2 file3 foo 4 4 4 bar 6 8 8 baz 4 4 5 quux 8 8 8 quuux 6 5 5 quuuux empty empty empty
What I'd like to do is something like this:
name file1 file2 %change file3 %change foo 4 4 0% 4 0% bar 6 8 133% 8 0% baz 4 4 0% 5 125% quux 8 8 0% 8 0% quuux 6 5 -83% 5 0% quuuux empty empty empty
My code is very straightforward, the only difference being I'm handling a file with 20+ columns and the each file is 300,000+ lines.
I'm populating %$hash exactly as you might expect, opening each file, iterating, if $hash->{$file}->{$name} isn't defined I define it, otherwise I increment $hash->{$file}->{$name}++. (The other subtle differences are intentional, like how 99 appears on some of the rows, but since it doesn't appear in the mapping of nums to names, I don't include it in the report.)
It doesn't seem like trying to calculate differences inside the same loop I'm using to iterate the files is the way to go.
I only see a couple of possible paths, but I'm can't wrap my head around either of them very well. Should I iterate the resulting %$hash after I finish creating it and make a new hash out of the results? Or, while I populate %$hash, should I also somehow populate an additional hash to make it easy to calculate later?
Any advice appreciated.
Just so to include the obligatory code sample it goes something like this:
Then I iterate @heading and @report and print them out cell by cell.my $hash = {}; my @heading = ( 'name' ); my @report; for my $fn ( sort @filenames ) { open my $fh, '<', $fn or die "Error opening file ${fn}: $!\n"; # Read in each filename and populate the hash # while (<$fh>) { chomp; s%\r%%; my @line = split /\|/; my @curnums = split( ',', $line[2] ); for my $curnum ( @curnums ) { next unless $defined $nums_to_names->{$curnum}; if ( ! defined $hash->{$fn} or ! defined $hash->{$fn}->{$c +urnum} ) { $hash->{$fn}->{$curnum} = 1; } else { $hash->{$fn}->{$curnum}++; } } } # Iterate the mapping of numbers to names # for my $curnum ( sort keys %$nums_to_names ) { my @report_line; # Skip it unless it's defined and has a value # my $name = defined $nums_to_names->{$curnum} && $nums_to_names->{$curnum} ? $nums_to_names->{$curnum} : next ; push @report_line, $name; # For the current mapping number, pluck the corresponding coun +ts # related to each file # for my $curfilename ( sort keys %$hash ) { my $count = defined $hash->{$curfilename}->{$curnum} && $hash->{$curfilename}->{$curnum} ? commify($hash->{$curfilename}->{$curnum}) : 'empty' ; push @report_line, $count; } push @report, \@report_line; } } push @heading, basename($_) for sort @filenames;
Any tips on how I might add the percentage change between columns?
--
Andy
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Calculating percentage of change between columns
by choroba (Cardinal) on May 29, 2013 at 23:39 UTC | |
by naChoZ (Curate) on May 30, 2013 at 17:56 UTC |