slife has asked for the wisdom of the Perl Monks concerning the following question:

Brethren,

I've recently been assisting a colleague with some legacy code at work, and I came across the attached function, which I think is attempt to rewrite a C idiom into Perl.

By way of context, this function takes as arguments the name of a template file and a reference to a hash. The content of the template file is a set of pseudo-HTML template definitions, which are parsed out by this function and stored in the hashref.

Templates follow the pattern:

<TEMPLATE NAME="TEMPLATE1"> # Processing instructions here ... </TEMPLATE> <TEMPLATE NAME="TEMPLATE2"> # Instructions for some other action ... </TEMPLATE> # etc

The exact form of the processing instructions is irrelevant, as they are simply copied into the hash (as we shall see).

The subroutine is shown below (with error checking and some irrelevant detail removed for clarity):

sub read_template { my ($templfile, $hashref) = @_; my ($line, $dest, $stuff); my $fh = new FileHandle("<$templfile"); # This looks like a C idiom. # $dest is initially set to point to $stuff (which is # undef). # # If a template start is found, the hash key is set with # the template name, and # $dest is re-pointed at the value corresponding to this # hash key. # # Subsequent lines are appended to $dest (and hence to # the hash value), # until a closing tag is found, at which point $dest is # re-set to $stuff again. # # The hash is a reference, so we are also modifying the # caller at the same time. Nothing is explicitly # returned. # $dest = \$stuff; while ($line = <$fh>) { if ($line =~ /^\s*<\s*TEMPLATE\s+NAME="(.*)"\s*>/i) { $hashref->{$1} = ''; $dest = \$hashref->{$1}; } elsif ($line =~ /^\s*<\s*\/\s*TEMPLATE\s*>/i) { $dest = \$stuff; } else { ${ $dest } .= $line; } } $fh->close; }

My questions are these:

  1. Is my interpretation of the working of the function (in the comments in the code fragment above) correct?
  2. Is there a more "Perlish" way of expressing this idiom (preferably without the nastiness of modifying the caller from within the function)?

Replies are listed 'Best First'.
Re: C idioms in Perl
by jdporter (Paladin) on Sep 05, 2007 at 21:22 UTC

    Well, that's not too bad. The only obvious things that might make it more perlish (imho) would be to use the implicit argument ($_) rather than $line, and to process the file in chunks other than lines. I also wouldn't use FileHandle; it's not necessary in modern Perls.

    Here's a first clean-up, which uses a hash index rather than a reference to a hash value:

    local $ARGV = ($templfile); my $i = ''; while (<>) { if ( /^\s*<\s*TEMPLATE\s+NAME="(.*)"\s*>/i ) { $i = $1; } elsif ( /^\s*<\s*\/\s*TEMPLATE\s*>/i ) { $i = ''; } else { $hashref->{$i} .= $_; } } # cleanup: delete $hashref->{''};
    Here's another approach which is pushing the idiomaticity:
    while (<>) { no warnings 'uninitialized'; /^\s*<\s*TEMPLATE\s+NAME="(.*?)"\s*>/i .. /^\s*<\s*\/\s*TEMPLATE\s*>/i and $h{$1} .= $_; } # cleanup delete $hashref->{''}; s/^\s*<\s*TEMPLATE\s+NAME="(.*?)"\s*>\s*//i for values %$hashref;

    Beyond that, you could do things like

    local $/; local $_ = <>; %$hashref = /\s*<\s*TEMPLATE\s+NAME="(.*?)"\s*>(.*?)\s*<\s*\/\s*TEMPLA +TE\s*>/sig;
    but then you have to worry about the size of the file in memory.

    A word spoken in Mind will reach its own level, in the objective world, by its own weight
Re: C idioms in Perl
by Anno (Deacon) on Sep 05, 2007 at 23:18 UTC
    I agree that the use of a scalar reference to keep track of a hash value looks like a something a C programmer would do. In general, scalar references are rarely used in Perl (they do have their place). In this case, knowing the hash(ref) and the key is good enough. Declaring all the variables near the beginning of a sub is another C-ism. Most of the time it is possible, and preferrable, to declare variables where they are first used.

    The fact that the input format uses a fixed string "</TEMPLATE>" as a terminator allows for reading one record per input-operation. That way there is no need for line-assemblage to build up the hash value. Each round in the read loop processes one template.

    In the code I'm using the DATA filehandle instead of opening the given file.

    sub read_template { my ($templfile, $hashref) = @_; my $fh = \ *DATA; # by way of open(...) local $/ = '</TEMPLATE>'; # set the record terminator while ( <$fh> ) { last if /^\s*$/; # ignore trailing newline(s) chomp; # remove trailing terminator my ( $name) = /^\s*<\s*TEMPLATE\s+NAME="(.*)"\s*>\s*/i or die "data error in $templfile, chunk $."; $hashref->{ $name} = substr( $_, $+[ 0]); } } __DATA__ <TEMPLATE NAME="TEMPLATE1"> # Processing instructions here ... </TEMPLATE> <TEMPLATE NAME="TEMPLATE2"> # Instructions for some other action ... </TEMPLATE>
    Note the slight modification in the name-extracting regex. I have made it match trailing whitespace, so that the index $+[0] (for @- and @+ see perlvar) points to the first valid character of the template text.

    Anno

Re: C idioms in Perl
by dwm042 (Priest) on Sep 05, 2007 at 22:19 UTC
    I think jdporter has done a terrific job of analyzing this piece of code, but the one thing I see is that there is no need to load data you don't need. Just check to see that the data are between the template tags and refuse to load otherwise.

    my $hash_key = undef; while (<>) { if ( /^\s*<\s*TEMPLATE\s+NAME="(.*)"\s*>/i ) { $hash_key = $1; $hashref->{$hash_key} = ''; } elsif ( /^\s*<\s*\/\s*TEMPLATE\s*>/i ) { $hash_key = undef; } elsif (defined($hash_key)) { $hashref->{$hash_key} .= $_; } }