G'day jaypal,
I'll first show how I might have coded this. Then I'll walk through it, line by line, and explain the differences between our versions (answering your questions along the way).
Here's my script. The __DATA__ and output are the same as yours.
#!/usr/bin/env perl -l use strict; use warnings; my (%data, %codes_found); my $sep = '^'; while (<DATA>) { next if $. == 1; chomp; my ($name, $code, $count) = split /\Q$sep/; ++$codes_found{$code}; $data{$name}{$code} = $count; } my @codes = sort keys %codes_found; print join $sep => 'Name', @codes; for my $name (sort keys %data) { print join $sep => $name, map { $data{$name}{$_} || '' } @codes; }
Here's the walkthrough.
In your shebang line, use whatever identifies the perl you want. The part I wanted to highlight was the -l switch: this saves you having to add "\n" to all your print statements. It's not always what you want but is very often useful. See "perlrun: Command Switches".
Same as your script. (Except in extremely rare cases, e.g. to demonstrate the effects of their absence, I use these in all my scripts.)
Not too dissimiliar to what you have. I didn't need an @flds (see below). You can declare multiple my variables in a list. My %data is used for the same function as your $map. Dereferencing has some minimal overhead and requires a small amount of extra code (i.e. '->'): unless $hashref is really what you want, I find %hash is generally a better choice.
Unless you really need a $line variable, I usually find the default $_ suffices.
If you intend skipping entire lines without performing any processing on them, do the next before any processing, including chomp.
If you're going to split your lines in a while loop such as this, a fresh my @fields variable (on each iteration) is a better choice. While it makes little difference in a small script like this, getting into the habit of making your variables available in the smallest scope possible means you'll avoid the problems associated with global variables: if the script was changed, additional logic complexity could result in a hard-to-track-down bug where you were perhaps operating on values from a previous iteration.
As we're returning just three, well-defined values, my ($name, $code, $count) = ... ticks the limited scope box and makes the following lines more readable and maintainable.
Also note, I'm using the $sep variable already defined rather than hard-coding a value here: if the separator changed or you wanted to abstract this for multiple data sources with different separators, you only need to change one value. In a more complex situation, you may need a $in_sep and a $out_sep; however, that still equates to changing values in one place rather than having to search your entire script for hard-coded values and make multiple changes. See quotemeta if you're unfamiliar with \Q.
You asked about this. It's a standard and well-known idiom: it's use is quite appropriate here.
In a simple statement such as this, the use of the prefix or postfix forms doesn't make any difference. In a more complex statement, they could easily make a difference to the logic, so make sure you understand both forms of autoincrement and autodecrement. See "perlop: Auto-increment and Auto-decrement".
Also consider the readability (of the keys) of $codes_found{$code} vs. $codes{$flds[1]}. Furthermore, if the input data format changed, the former (with $code) would probably still work as written, while the latter (with $flds[1]) may well need modification.
While this does the same as your $map->{$flds[0]}->{$flds[1]} = $flds[2], consider the same readability and maintainability points that I raised above.
I don't know the purpose of the or next on that line of your script. In this particular instance, it's a no-op (i.e. $flds[2] is always TRUE, so next is never called) so you were lucky; in another instance, that no-op could well become a bug!
By the way, after the first $hashref->{$key} or $arrayref->[$index], you don't need to keep repeating the '->' to drill down into a complex data structure. For instance, $hashref->{$key_outer}{$key_inner} would be fine; this works for any complexity, e.g. $arrayref->[$i]{$key}[$j] is also fine.
You need the codes in that order twice; just generate the list once.
A join is a simpler solution than a for loop to print that one line.
No difference in logic to your script. $name is more meaningful and exactly mirrors its use earlier. Compare that single name to your $flds[0] and $k1 and consider the same readability and maintainence issues already raised.
By the way, for and foreach are synonymous. I go with the laziness virtue on this one and save myself four keystrokes each time I want a foreach loop.
You asked a couple of questions about this part.
As we already have @codes, an additional for is not needed and only a single print statement is required. While there's other ways to do this, that probably answers your "How can I write this idiomatically." question.
You're quite correct about autovivification. The exists function is often a good way to avoid this. In this case, $data{$name}{$_} || '' causes no autovivification.
-- Ken
In reply to Re: Print something when key does not exist
by kcott
in thread Print something when key does not exist
by jaypal
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |