Calculating percentage of change between columns

naChoZ has asked for the wisdom of the Perl Monks concerning the following question:

I'm wondering if there's a simple way I could shoehorn this feature in without a full rewrite. Basically I want to just throw in a quick calculation of changes between columns.

Here's a trimmed down example of what I'm doing:

Read a file of num to name mappings:

num|name
1|foo
12|bar
15|bar
18|baz
25|quux
27|quux
37|quuux
48|quuuux
[download]

Read any number of files (granted, it's rarely more than a handful) that look like this:

file1:
acct|description|nums
foo-001|foo one|1,12,27,37
foo-002|foo two|1,15,25,37
foo-003|foo three|1,18,25,37
foo-004|foo four|12,18,25,37
foo-005|foo five|12,15,25,27,37
foo-006|foo six|1,12,25,27,37,99

file2:
acct|description|nums
foo-001|foo one|1,12,15,27,37
foo-002|foo two|1,15,25,37
foo-003|foo three|1,18,25
foo-004|foo four|12,18,25,37
foo-005|foo five|12,15,25,27,37
foo-006|foo six|1,12,15,25,27,37,99

file3:
acct|description|nums
foo-001|foo one|1,12,15,18,27,37
foo-002|foo two|1,15,25,37
foo-003|foo three|1,18,25
foo-004|foo four|12,18,25,37
foo-005|foo five|12,15,25,27,37
foo-006|foo six|1,12,15,25,27,37,99
[download]

I iterate and increment values in a hashref as I go so it ends up looking like:

$hash => {
    file1 => {
                foo    => 4,
                bar    => 6,
                baz    => 4,
                quux   => 8,
                quuux  => 6,
                quuuux => 0,
             },

    file2 => {
                foo    => 4,
                bar    => 8,
                baz    => 4,
                quux   => 8,
                quuux  => 5,
                quuuux => 0,
             },

    file3 => {
                foo    => 4,
                bar    => 8,
                baz    => 5,
                quux   => 8,
                quuux  => 5,
                quuuux => 0,
             },
}
[download]

And then simply I produce a report like this to make it easy to spot the differences:

name    file1   file2   file3
foo     4       4       4
bar     6       8       8
baz     4       4       5
quux    8       8       8
quuux   6       5       5
quuuux  empty   empty   empty
[download]

What I'd like to do is something like this:

name    file1   file2   %change  file3   %change
foo     4       4       0%       4       0%
bar     6       8       133%     8       0%
baz     4       4       0%       5       125%
quux    8       8       0%       8       0%
quuux   6       5       -83%     5       0%
quuuux  empty   empty            empty
[download]

My code is very straightforward, the only difference being I'm handling a file with 20+ columns and the each file is 300,000+ lines.

I'm populating %$hash exactly as you might expect, opening each file, iterating, if $hash->{$file}->{$name} isn't defined I define it, otherwise I increment $hash->{$file}->{$name}++. (The other subtle differences are intentional, like how 99 appears on some of the rows, but since it doesn't appear in the mapping of nums to names, I don't include it in the report.)

It doesn't seem like trying to calculate differences inside the same loop I'm using to iterate the files is the way to go.

I only see a couple of possible paths, but I'm can't wrap my head around either of them very well. Should I iterate the resulting %$hash after I finish creating it and make a new hash out of the results? Or, while I populate %$hash, should I also somehow populate an additional hash to make it easy to calculate later?

Any advice appreciated.

Just so to include the obligatory code sample it goes something like this:

my $hash = {};
my @heading = ( 'name' );
my @report;
for my $fn ( sort @filenames ) {

    open my $fh, '<', $fn
        or die "Error opening file ${fn}: $!\n";

    # Read in each filename and populate the hash
    #
    while (<$fh>) {
        chomp;
        s%\r%%;
        my @line = split /\|/;
        my @curnums = split( ',', $line[2] );

        for my $curnum ( @curnums ) {
            next unless $defined $nums_to_names->{$curnum};
            if ( ! defined $hash->{$fn} or ! defined $hash->{$fn}->{$c
+urnum} ) {
                $hash->{$fn}->{$curnum} = 1;
            } else {
                $hash->{$fn}->{$curnum}++;
            }
        }
    }

    # Iterate the mapping of numbers to names
    #
    for my $curnum ( sort keys %$nums_to_names ) {

        my @report_line;

        # Skip it unless it's defined and has a value
        #
        my $name =
            defined $nums_to_names->{$curnum}
            &&      $nums_to_names->{$curnum}
                  ? $nums_to_names->{$curnum}
                  : next
                  ;

        push @report_line, $name;


        # For the current mapping number, pluck the corresponding coun
+ts 
        # related to each file
        #
        for my $curfilename ( sort keys %$hash ) {

             my $count =
                        defined $hash->{$curfilename}->{$curnum}
                        &&      $hash->{$curfilename}->{$curnum}
                      ? commify($hash->{$curfilename}->{$curnum})
                      : 'empty'
                      ;

            push @report_line, $count;

        }

        push @report, \@report_line;

    }

}

push @heading, basename($_) for sort @filenames;
[download]

Then I iterate @heading and @report and print them out cell by cell.

Any tips on how I might add the percentage change between columns?

--
Andy

Comment on Calculating percentage of change between columns Select or Download Code

Replies are listed 'Best First'.
Re: Calculating percentage of change between columns by choroba (Cardinal) on May 29, 2013 at 23:39 UTC
Note that your sample output is wrong. If going from 4 to 4 changes 0%, than going from 6 to 8 should change 33%, not 133%. Also, you cannot express the change going anywhere from 0. You might find the following code useful. I have modified some values to show some other interesting results. #!/usr/bin/perl use warnings; use strict; my %hash = ( file1 => { foo => 4, bar => 6, baz => 4, quux => 8, quuux => 6, quuuux => 0, }, file2 => { foo => 4, bar => 8, baz => 4, quux => 8, quuux => 5, quuuux => 0, }, file3 => { foo => 8, bar => 0, baz => 5, quux => 8, quuux => 5, quuuux => 3, }, ); my %columns = map { $_ => 1 } map keys %{ $hash{$_} }, keys %hash; my @files = qw/file1 file2 file3/; for my $column (keys %columns) { print $column, "\t", $hash{$files[0]}{$column}; for my $i (1 .. $#files) { my ($this, $previous) = map $hash{ $files[$_] }{$column}, $i, +$i - 1; print "\t", $this; my $change = '-'; if ($previous) { $change = sprintf "%d%%", 100 * ($this - $previous) / $pre +vious; } print "\t$change"; } print "\n"; } [download] Output: `bar 6 8 33% 0 -100% baz 4 4 0% 5 25% quuux 6 5 -16% 5 0% quux 8 8 0% 8 0% foo 4 4 0% 8 100% quuuux 0 0 - 3 -` [download] لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l] [select]
Re^2: Calculating percentage of change between columns by naChoZ (Curate) on May 30, 2013 at 17:56 UTC
Perfect, thanks for that. The iterating of the array index and then using it in the map like you did with $this and $previous instead of the foreach style I was using was just what I needed. -- Andy	[reply]