Appending values to existing hash key

jaypal has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Appending values to existing hash key by GrandFather (Saint) on Mar 21, 2014 at 03:08 UTC
A number of minor things, most of which are personal preference: I tend to use statement modifiers where there would only otherwise be one small statement in a controlled block. remove the trailing white space from the captured strings and require at least one of each character type in the regex to avoid some nasty edge cases Avoid intermediate variables where the calculation is trivial (@val for example) Use double quote interpolation instead of concatenation `#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %hash; while (<DATA>) { (my $key, my $value) = /(\S+\s+\S+)\s+(\S+\s+\S+)/; $value = join ":", split " ", $value; $hash{$key} .= ", " if exists $hash{$key}; $hash{$key} .= $value; } print "$_ $hash{$_}\n" for sort keys %hash; __DATA__ id1 name1 cat1 catname1 id1 name1 cat2 catname2 id2 name2 cat3 catname3 id3 name3 cat1 catname1 id3 name3 cat4 catname4` [download] Prints: `id1 name1 cat1:catname1, cat2:catname2 id2 name2 cat3:catname3 id3 name3 cat1:catname1, cat4:catname4` [download] If the code changes take longer than the time saved, it's fast enough already.	[reply] [d/l] [select]
Re^2: Appending values to existing hash key by jaypal (Beadle) on Mar 21, 2014 at 03:34 UTC
Thank you so much for replying so fast and hitting all my pain points. This is exactly what I was looking for `$value = join ":", split " ", $value; # This is awesome $hash{$key} .= ", " if exists $hash{$key}; # I was thinking more i +n lines of a ternary op but this is short and concise too $hash{$key} .= $value; } print "$_ $hash{$_}\n" for sort keys %hash; # I definitely like this a +pproach` [download] Thank you once again!	[reply] [d/l]
Re: Appending values to existing hash key by NetWallah (Canon) on Mar 21, 2014 at 03:18 UTC
Arguably more idomatic for the split/join: `$val=~s/\s+/:/g;` [download] But this modifies the $val. For the second question of appending to hash values, I'd suggest a better structure would be a hash of arrays" `push @{ $hash{$key} }, $v;` [download] That way, you do not need to check for existance - it is auto-vivified. The difference is - for output, you will have to "join" with $sep: `... output for each $key .... join $sep,@{ $hash{$key} }` [download] What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against? -Larry Wall, 1992	[reply] [d/l] [select]
Re^2: Appending values to existing hash key by jaypal (Beadle) on Mar 21, 2014 at 03:31 UTC
Thank you for replying so fast. Yes, doing a hash of arrays did occur to me. Something along the lines - `chomp; (my $k, my @v) = /(\S\s\S\s)(\S\s\S\s)/; push @{$h{$k}} , shift @v;` [download] But dropped the idea because I had to iterate twice: `foreach my $key (sort keys %h) { print "$key"; foreach my $a (@{$h{$key}}) { print "$a "; } print "\n"; }` [download] But I see how in a bigger data set this should be the way to go. Checking for existence in a larger hash could be counter productive.	[reply] [d/l] [select]
Re^3: Appending values to existing hash key by 2teez (Vicar) on Mar 21, 2014 at 04:11 UTC
Hi jaypal, i totally agreed with ++NetWallah and really you can use a map for the nested for loop like so: `use warnings; use strict; my %hash; while (<DATA>) { my @data = split; push @{ $hash{ $data[0] }{ $data[1] } } => "$data[2]: $data[3]"; } for my $key ( sort keys %hash ) { print $key, ' ', join( ', ' => map { $_, @{ $hash{$key}{$_} } } keys %{ $hash{$ke +y} } ), $/; } __DATA__ id1 name1 cat1 catname1 id1 name1 cat2 catname2 id2 name2 cat3 catname3 id3 name3 cat1 catname1 id3 name3 cat4 catname4` [download] UPDATE: Code updated as indicated by NetWallah. If you tell me, I'll forget. If you show me, I'll remember. if you involve me, I'll understand. --- Author unknown to me	[reply] [d/l]
Re^4: Appending values to existing hash key by NetWallah (Canon) on Mar 21, 2014 at 04:47 UTC
Re^5: Appending values to existing hash key by 2teez (Vicar) on Mar 21, 2014 at 06:56 UTC
Re^4: Appending values to existing hash key by jaypal (Beadle) on Mar 21, 2014 at 04:31 UTC
Re^3: Appending values to existing hash key by Laurent_R (Canon) on Mar 21, 2014 at 07:24 UTC
But I see how in a bigger data set this should be the way to go. Checking for existence in a larger hash could be counter productive. You probably don't have to worry about that, searches in hashes are fast and usually do not depend on the size of the hashes.	[reply]
Re^3: Appending values to existing hash key by hazylife (Monk) on Mar 21, 2014 at 10:41 UTC
Laurent is right, have a look at these two posts/subthreads to get a good idea of how fast Perl's hashes are: Re: Determining uniqueness in a string. (giant hash lookup...) Re^2: Determining uniqueness in a string. (...versus a tiny Inline::C function)	[reply]
Re: Appending values to existing hash key by wind (Priest) on Mar 21, 2014 at 02:10 UTC
One potential alternative, ensuring keys stay in the same order: `#!/usr/bin/perl use strict; use warnings; my @keys; my %vals; while (<DATA>) { chomp; my @data = split; my $key = join ' ', splice @data, 0, 2; push @keys, $key if !$vals{$key}; push @{$vals{$key}}, join ':', @data; } for (@keys) { print "$_ " . join(', ', @{$vals{$_}}) . "\n"; } __DATA__ id1 name1 cat1 catname1 id1 name1 cat2 catname2 id2 name2 cat3 catname3 id3 name3 cat1 catname1 id3 name3 cat4 catname4` [download] Outputs: `id1 name1 cat1:catname1, cat2:catname2 id2 name2 cat3:catname3 id3 name3 cat1:catname1, cat4:catname4` [download] - Miller	[reply] [d/l] [select]
Re^2: Appending values to existing hash key by jaypal (Beadle) on Mar 21, 2014 at 03:41 UTC
Thanks for the prompt reply. I have to agree, your approach is very clever. Thank you for taking out time to write this alternative solution (a very good one indeed!).	[reply]
Re: Appending values to existing hash key by graff (Chancellor) on Mar 21, 2014 at 03:52 UTC
`my @val = split(" ",$val); my $v = join (":", @val);` [download] Assuming that you've already made sure that $val contains only the fields that need to be delimited by colon instead of whitespace, you could just do this: `$val =~ s/\s+/:/g` [download] As for the other snippet, you could use a hash of arrays instead of a hash of concatenated strings, then join the array elements for each hash key after the whole set has been loaded. Here are the relevant loops (with a few other changes): `my %hash; while (<DATA>) { s/\s+$//; # remove all trailing whitespace (not just newline) s/(\S+\s+\S+\s+)//; # remove (and capture) first two fields my $key = $1; s/\s+/:/g; # join remaining fields with ":" push @{$hash{$key}}, $_; } foreach my $k (sort keys %hash) { print $k.join( ', ', @{$hash{$k}} )."\n"; }` [download] There's nothing wrong with the way you originally did it - it works, it's clear, it's not wasteful, and that's all good.	[reply] [d/l] [select]
Re^2: Appending values to existing hash key by jaypal (Beadle) on Mar 21, 2014 at 04:34 UTC
Thank you so much for all the explanation on each code snippet. Appreciate your help!	[reply]
Re: Appending values to existing hash key by kcott (Archbishop) on Mar 21, 2014 at 05:50 UTC
G'day jaypal, Welcome to the monastery. Probably the only part of this code that I would seriously question is the regex: Instead of matching zero or more times throughout (i.e. with '``'), matching one or more times throughout (i.e. with '`+`') may be more appropriate. Would '`\w+`' be a better choice than '`\S+`': both exhibit the same behaviour with the sample data shown but the former might be more useful in finding bad data* (in a real application). The final '`\s`' is questionable. Is that something you really want to capture? Also, note that leaving that out of the regex means that you don't need the chomp on the preceding line. Your sample data could also have a little more description. The code suggests that spaces are significant in some places and not in others: Spaces between `idN` & `nameN` and `nameN` & `catN` are significant for the hash key. Is this correct? Should records starting with '`id1 name1`' and '`id1 name1`' use different hash keys? Spaces between `catN` & `catnameN` and after `catnameN` are not significant for the output. Do you need to capture these? [This is slightly off-topic for your specific questions; however, if you're parsing fixed-width data, consider using unpack (there's a tutorial for this: perlpacktut).]* Your two lines `my @val = split(" ",$val); my $v = join (":", @val);` [download] could be written as a single line not requiring the intermediary `@val`: `my $v = join ':', split ' ', $val;` [download] If you're not already familiar with this, take a look at the split documentation to see the difference between the "`split /PATTERN/`"s `' '`, `/ /` and `/\s+/`. In the script below, I've modified the regex so that an explicit `split` isn't required and the `join` is performed as part of the assignment to `$val`. This gets rid of both of those two lines as well as both of the `@val` and `$v` variables. Your code `if (exists $hash{$key}) { $hash{$key} .= $sep.$v; } else { $hash{$key} = $v; }` [download] looks fine as it is. It's easy to both read and maintain. You can often replace this construct by using a ternary operator (aka "Conditional Operator"); however, I think the following looks clunky and is inferior to what you've already written (with respect to readability and maintainability): `$hash{$key} = exists $hash{$key} ? $hash{$key} . $sep . $v : $v;` [download] or even `$hash{$key} = exists $hash{$key} ? "$hash{$key}$sep$v" : $v;` [download] In the script below, I haven't needed that construct because I've allowed `$hash{$key}` to autovivify (with `push @{$data{$key}}, $val;`) and performed the concatenation as part of the final print. [Side Note: print takes a list. There's no need to explicitly concatenate its arguments. In the specific case of your code here, you could have written `print $k.$hash{$k}."\n";` as just `print "$k$hash{$k}\n";`; although, that's probably less readable than `print $k, $hash{$k}, "\n";`.] Here's the script I've referred to a couple of times above. `#!/usr/bin/env perl -l use strict; use warnings; my %data; my $re = qr{^(\w+\s+\w+)\s+(\w+)\s+(\w+)}; while (<DATA>) { my ($key, $val) = map { $_->[0], join ':', @$_[1,2] } [/$re/]; push @{$data{$key}}, $val; } print "$_ ", join ', ', @{$data{$_}} for sort keys %data; __DATA__ id1 name1 cat1 catname1 id1 name1 cat2 catname2 id2 name2 cat3 catname3 id3 name3 cat1 catname1 id3 name3 cat4 catname4` [download] Output: `id1 name1 cat1:catname1, cat2:catname2 id2 name2 cat3:catname3 id3 name3 cat1:catname1, cat4:catname4` [download] -- Ken	[reply] [d/l] [select]
Re^2: Appending values to existing hash key by jaypal (Beadle) on Mar 21, 2014 at 14:13 UTC
Thank you so much Ken. Your explanation was great and helped a great deal on learning the good and bad pracitices. I am starting to follow how perl takes a natural language construct by saying `print "$_ ", join ', ', @{$data{$_}} for sort keys %data;` [download] to do what I was very verbose with. map function is something I am still struggling with. Hopefully with little more practice, I can wrap the concept around my head. Thanks again, your answer was very enlightening.	[reply] [d/l]
Re: Appending values to existing hash key by Lennotoecom (Pilgrim) on Mar 21, 2014 at 06:56 UTC
maybe similar to something posted above, anyway `while (@t = split /\s+/, <DATA>){ push (@{$h{$t[0]}{$t[1]}}, "$t[2]:$t[3]"); } foreach $k (sort keys %h){ print "$k ", join(' ', map {$_, @{$h{$k}{$_}}} sort keys %{$h{$k}} +),"\n"; } __DATA__ id1 name1 cat1 catname1 id1 name1 cat2 catname2 id2 name2 cat3 catname3 id3 name3 cat1 catname1 id3 name3 cat4 catname4` [download]	[reply] [d/l]
Re: Appending values to existing hash key by locked_user sundialsvc4 (Abbot) on Mar 21, 2014 at 14:05 UTC
As others have said, I approach these problems by looking for a representative data-structure, first; a way to get the data into that structure, second; a way to produce the desired output from that structure, third. Perl’s “auto-vivification” features are already designed to make the construction of such data-structures very easy. If you try to go too-directly from input to output, soon enough someone will want you to add another output, or to do something else with the data between the input and output stages.