in reply to Re: Best way to store/sum multiple-field records?
in thread Best way to store/sum multiple-field records?

Actually, I just noticed the "double up" of the names in the output - would that best be counteracted by another "defined" check along the lines of:

if (! defined $sum{$key}{reason}) { push @{$sum{$key}{reason}}, $reason; }

Replies are listed 'Best First'.
Re^3: Best way to store/sum multiple-field records?
by GrandFather (Saint) on Dec 23, 2014 at 01:37 UTC

    Did you try it? Did it work?

    How do you expect "Jones,Tom/Jones, Tom" to be handled?

    Aside from the formatting issue, I'd be inclined to use a nested hash instead of an array if you want to suppress duplicates.

    Perl is the programming world's equivalent of English

      I've had a look over both yours and toolics' solutions, and decided to stick with yours, as I found dealing with the output of your version a lot easier - not too sure how to get the output from toolics' version into a nice pipe-delimited format. Had a poke around on Google and here, and couldn't find anything that my mind attached itself to as a reasonable solution.

      So, after having a bit more of a play, this is what I've ended up with. Could you take a look, and see if you can see anything wrong with it, please?

      #!/usr/bin/perl use strict; use warnings; my %sum; while (<DATA>) { chomp; my ($key, $value, $reason) = split(/\|/); if (! defined $reason || $value !~ /^\d+$/) { warn qq<dropped line: "$_"\n>; next; } $sum{$key}{value} += $value; if (! defined $sum{$key}{reason}) { $sum{$key}{reason} = $reason; } } local $" = '/'; for my $key (keys %sum) { print "$key|$sum{$key}{value}|$sum{$key}{reason}\n"; } __DATA__ USERID1|2215|Jones,Tom| USERID1|1000|Jones, Tom| USERID3|1495|Dole, Bob| USERID2|2500|Francis, Pope| USERID2|1500|Francis, Pope| USERID4|0045|Doe, John| USERID5|1225|Doe, Jane| USERID4|4995|Doe, John| USERID4|9995|Doe, John| USERID6|1095|Darwin, Anita| USERID7|1495|Dawson, Gary| USERID6|1250|Darwin, Anita|
      Prints:
      USERID5|1225|Doe, Jane USERID3|1495|Dole, Bob USERID7|1495|Dawson, Gary USERID1|3215|Jones,Tom USERID4|15035|Doe, John USERID2|4000|Francis, Pope USERID6|2345|Darwin, Anita

        Assuming you are using a modern build of Perl you can replace the duplicate skipping code with:

        $sum{$key}{reason} //= $reason;

        Note too that you should not use an editor that inserts tabs. They really screw up indenting when you use a different editor or render the file in almost any way except with the original editor.

        Also, take note of the indentation style of the code you are working with and try to match it. Code gets really nasty to work with if it contains a mixture of different indentation styles. You should think about what you consider to be important elements of code formatting and develop a style you can use consistently. Unless you have a really good reason to go some other way, the majority of the Perl world seems to use K&R, inherited no doubt from the *nix C world, so you should base your style on that.

        Perl Tidy is an excellent Perl "pretty printer". I strongly recommend that you install and use it, especially to clean code up before pushing it into your revision control system, or before publishing it anywhere (like on PerlMonks).

        Perl is the programming world's equivalent of English

      I did, and it did. The main reason I was asking, was in case there was something wrong with doing it that way - just because it seems to work, doesn't mean it's "right", so I wanted to make sure I wasn't doing something wrong...

      As for your question about the handling of the different formatting of the names - that shouldn't be an issue, as the source of these data he's using is a database where the data will be consistent for each user ID.