comment on

Welcome to the monastery.

Probably the only part of this code that I would seriously question is the regex:

Instead of matching zero or more times throughout (i.e. with '*'), matching one or more times throughout (i.e. with '+') may be more appropriate.
Would '\w+' be a better choice than '\S+': both exhibit the same behaviour with the sample data shown but the former might be more useful in finding bad data (in a real application).
The final '\s*' is questionable. Is that something you really want to capture? Also, note that leaving that out of the regex means that you don't need the chomp on the preceding line.

Your sample data could also have a little more description. The code suggests that spaces are significant in some places and not in others:

Spaces between idN & nameN and nameN & catN are significant for the hash key. Is this correct? Should records starting with 'id1 name1' and 'id1 name1' use different hash keys?
Spaces between catN & catnameN and after catnameN are not significant for the output. Do you need to capture these?

[This is slightly off-topic for your specific questions; however, if you're parsing fixed-width data, consider using unpack (there's a tutorial for this: perlpacktut).]

Your two lines

my @val = split(" ",$val);
my $v = join (":", @val);
[download]

could be written as a single line not requiring the intermediary @val:

my $v = join ':', split ' ', $val;
[download]

If you're not already familiar with this, take a look at the split documentation to see the difference between the "split /PATTERN/"s ' ', / / and /\s+/.

In the script below, I've modified the regex so that an explicit split isn't required and the join is performed as part of the assignment to $val. This gets rid of both of those two lines as well as both of the @val and $v variables.

Your code

if (exists $hash{$key}) {
    $hash{$key} .= $sep.$v;
}
else {
    $hash{$key} = $v;
}
[download]

looks fine as it is. It's easy to both read and maintain. You can often replace this construct by using a ternary operator (aka "Conditional Operator"); however, I think the following looks clunky and is inferior to what you've already written (with respect to readability and maintainability):

$hash{$key} = exists $hash{$key} ? $hash{$key} . $sep . $v : $v;
[download]

or even

$hash{$key} = exists $hash{$key} ? "$hash{$key}$sep$v" : $v;
[download]

In the script below, I haven't needed that construct because I've allowed $hash{$key} to autovivify (with push @{$data{$key}}, $val;) and performed the concatenation as part of the final print.

[Side Note: print takes a list. There's no need to explicitly concatenate its arguments. In the specific case of your code here, you could have written print $k.$hash{$k}."\n"; as just print "$k$hash{$k}\n";; although, that's probably less readable than print $k, $hash{$k}, "\n";.]

Here's the script I've referred to a couple of times above.

#!/usr/bin/env perl -l

use strict;
use warnings;

my %data;
my $re = qr{^(\w+\s+\w+)\s+(\w+)\s+(\w+)};

while (<DATA>) {
    my ($key, $val) = map { $_->[0], join ':', @$_[1,2] } [/$re/];
    push @{$data{$key}}, $val;
}

print "$_ ", join ', ', @{$data{$_}} for sort keys %data;

__DATA__
id1    name1 cat1  catname1
id1    name1 cat2  catname2
id2    name2 cat3  catname3
id3    name3 cat1  catname1
id3    name3 cat4  catname4
[download]

Output:

id1    name1 cat1:catname1, cat2:catname2
id2    name2 cat3:catname3
id3    name3 cat1:catname1, cat4:catname4
[download]

-- Ken

In reply to Re: Appending values to existing hash key by kcott
in thread Appending values to existing hash key by jaypal

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.