semio has asked for the wisdom of the Perl Monks concerning the following question:

fellow monks,

I have a flat file which is pipe delimited. Each line has two values. I'm attempting to show unique values for the first field in each line and, where there are duplicates, add the values provided in the second field. The following code is what I have so far:

#!/usr/bin/perl -w use strict; use Data::Dumper; my @count; my ($field1, $field2); while (<DATA>) { ($field1, $field2) = split/\|/; push @count, {'field1' => $field1, 'field2' => $field2}; } print Data::Dumper->Dump([\@count]); __DATA__ 10|10 20|20 30|30 10|100 15|15 50|50 15|150

I believe what I need is to have data dumper output something as follows:

$VAR1 = { '10' => [ '10', '100' ], '20' => [ '20' ], '30' => [ '30', ], '15' => [ '15', '150' ], '50' => [ '50' ] };

By this, I should be able to loop through and add the values of the field 2 keys to give me the sum I'm looking for. This is new territory for me so any suggestions are appreciated. cheers.

Replies are listed 'Best First'.
Re: working with a hash
by johngg (Canon) on Jul 02, 2007 at 21:54 UTC
    You could use a hash rather than an array to more easily get your unique left-column items.

    use strict; use warnings; use Data::Dumper; my %uniques = (); while ( <DATA> ) { chomp; my ($key, $value) = split m{\|}; $uniques{$key} += $value; } print Data::Dumper->Dump([\%uniques], [q{*uniques}]); __DATA__ 10|10 20|20 30|30 10|100 15|15 50|50 15|150

    This produces

    %uniques = ( '50' => 50, '30' => 30, '10' => 110, '15' => 165, '20' => 20 );

    I hope this is helpful.

    Cheers,

    JohnGG

Re: working with a hash
by FunkyMonk (Bishop) on Jul 02, 2007 at 21:11 UTC
    How's about:

    #!/usr/bin/perl -w use strict; use Data::Dumper; my ($field1, $field2); my %count; while (<DATA>) { chomp; ($field1, $field2) = split/\|/; push @{ $count{$field1} }, $field2; } print Dumper \%count; __DATA__ 10|10 20|20 30|30 10|100 15|15 50|50 15|150

Re: working with a hash
by naikonta (Curate) on Jul 03, 2007 at 04:01 UTC
    The problem is you can't have two keys with the same name because the keys of hash must be unique. So when you get the second "10", the value will override the first. The technique you need is putting together the values for the same key in an array reference. You don't have to check whether a key already exists if you want them to be in the same format, so as shown in previous replies, just treat the value of as array reference, dereference it, and push the new value. You may find perlreftut, perlref, and perldsc as rich resources.

    I'd like to also suggest you to lexicalize the fields variables within the while loop scope, since you don't need them outside of the loop.

    In more verbose:

    my %count; while (...) { my($field1, $field2) = split(...); # assume that we already has the key $field1 in %count # so the value would be $count{$field1}. # # assume it to be array reference, pretend you did: # $count{$field1} = [] # # the push() function requires real array so you need # to dereference it first, like this @{$count{$field1}}, # then push the new value # push @{$count{$field1}}, $field2; }

    However, if you want to keep the unique keys to be scalar instead of array reference, you have to check for keys existence.

    my %count; while (...) { my($field1, $field2) = split(...); if (exists $count{$field1}) { # force the value to be array ref if it's # not done so $count{$field1} = [$count{$field1}] unless ref $count{$field1}; push @{$count{$field1}}, $field2; } else { $count{$field1} = $field2; # normal assignment } }

    On more thing, for simple dumping, I would rather use the straightwoard Dumper() function:

    print Dumper(\%count);

    Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!