G'day jaypal,

I'll first show how I might have coded this. Then I'll walk through it, line by line, and explain the differences between our versions (answering your questions along the way).

Here's my script. The __DATA__ and output are the same as yours.

#!/usr/bin/env perl -l use strict; use warnings; my (%data, %codes_found); my $sep = '^'; while (<DATA>) { next if $. == 1; chomp; my ($name, $code, $count) = split /\Q$sep/; ++$codes_found{$code}; $data{$name}{$code} = $count; } my @codes = sort keys %codes_found; print join $sep => 'Name', @codes; for my $name (sort keys %data) { print join $sep => $name, map { $data{$name}{$_} || '' } @codes; }

Here's the walkthrough.

#!/usr/bin/env perl -l

In your shebang line, use whatever identifies the perl you want. The part I wanted to highlight was the -l switch: this saves you having to add "\n" to all your print statements. It's not always what you want but is very often useful. See "perlrun: Command Switches".

use strict;
use warnings;

Same as your script. (Except in extremely rare cases, e.g. to demonstrate the effects of their absence, I use these in all my scripts.)

my (%data, %codes_found);
my $sep = '^';

Not too dissimiliar to what you have. I didn't need an @flds (see below). You can declare multiple my variables in a list. My %data is used for the same function as your $map. Dereferencing has some minimal overhead and requires a small amount of extra code (i.e. '->'): unless $hashref is really what you want, I find %hash is generally a better choice.

while (<DATA>) {

Unless you really need a $line variable, I usually find the default $_ suffices.

next if $. == 1;
chomp;

If you intend skipping entire lines without performing any processing on them, do the next before any processing, including chomp.

my ($name, $code, $count) = split /\Q$sep/;

If you're going to split your lines in a while loop such as this, a fresh my @fields variable (on each iteration) is a better choice. While it makes little difference in a small script like this, getting into the habit of making your variables available in the smallest scope possible means you'll avoid the problems associated with global variables: if the script was changed, additional logic complexity could result in a hard-to-track-down bug where you were perhaps operating on values from a previous iteration.

As we're returning just three, well-defined values, my ($name, $code, $count) = ... ticks the limited scope box and makes the following lines more readable and maintainable.

Also note, I'm using the $sep variable already defined rather than hard-coding a value here: if the separator changed or you wanted to abstract this for multiple data sources with different separators, you only need to change one value. In a more complex situation, you may need a $in_sep and a $out_sep; however, that still equates to changing values in one place rather than having to search your entire script for hard-coded values and make multiple changes. See quotemeta if you're unfamiliar with \Q.

++$codes_found{$code};

You asked about this. It's a standard and well-known idiom: it's use is quite appropriate here.

In a simple statement such as this, the use of the prefix or postfix forms doesn't make any difference. In a more complex statement, they could easily make a difference to the logic, so make sure you understand both forms of autoincrement and autodecrement. See "perlop: Auto-increment and Auto-decrement".

Also consider the readability (of the keys) of $codes_found{$code} vs. $codes{$flds[1]}. Furthermore, if the input data format changed, the former (with $code) would probably still work as written, while the latter (with $flds[1]) may well need modification.

$data{$name}{$code} = $count;

While this does the same as your $map->{$flds[0]}->{$flds[1]} = $flds[2], consider the same readability and maintainability points that I raised above.

I don't know the purpose of the or next on that line of your script. In this particular instance, it's a no-op (i.e. $flds[2] is always TRUE, so next is never called) so you were lucky; in another instance, that no-op could well become a bug!

By the way, after the first $hashref->{$key} or $arrayref->[$index], you don't need to keep repeating the '->' to drill down into a complex data structure. For instance, $hashref->{$key_outer}{$key_inner} would be fine; this works for any complexity, e.g. $arrayref->[$i]{$key}[$j] is also fine.

my @codes = sort keys %codes_found;

You need the codes in that order twice; just generate the list once.

print join $sep => 'Name', @codes;

A join is a simpler solution than a for loop to print that one line.

for my $name (sort keys %data) {

No difference in logic to your script. $name is more meaningful and exactly mirrors its use earlier. Compare that single name to your $flds[0] and $k1 and consider the same readability and maintainence issues already raised.

By the way, for and foreach are synonymous. I go with the laziness virtue on this one and save myself four keystrokes each time I want a foreach loop.

print join $sep => $name, map { $data{$name}{$_} || '' } @codes;

You asked a couple of questions about this part.

As we already have @codes, an additional for is not needed and only a single print statement is required. While there's other ways to do this, that probably answers your "How can I write this idiomatically." question.

You're quite correct about autovivification. The exists function is often a good way to avoid this. In this case, $data{$name}{$_} || '' causes no autovivification.

-- Ken


In reply to Re: Print something when key does not exist by kcott
in thread Print something when key does not exist by jaypal

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.