Aldebaran has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks

I use perl to generate html pages and content within them, with a templating system that I have been doggedly improving over the course of events. Recent versions have included cyrillic captions, for example, and I learned a lot in simply achieving the capability.

Previous versions have used Text::Template to make text substitutions, but I wanted to try to achieve it with a handwritten version that has an appropriate control and regex, with all the flexibility of using hashes. Let me post the script before continuing...

#!/usr/bin/perl -w use strict; use 5.010; use lib "template_stuff"; use football1; use football2; use utils1; use Cwd; use File::Basename; use Net::FTP; use Path::Class; use File::Slurp; use File::Spec; # initializations that must precede main data structure my $ts = "template_stuff"; my $images = "aimages"; my $captions = "captions"; my $ruscaptions = "ruscaptions"; my $bom = "bom"; my $current = cwd; my $rd1 = dir($current); my @a = $rd1->dir_list(); my $srd1 = $rd1->stringify; my $title = $rd1->basename; say "title is $title"; # get headline from stdin say "Give me a headline: "; my $headline = <STDIN>; my $rd2 = dir(@a,$ts,$images); my $to_images = $rd2->stringify; my $rd3 = dir(@a,$ts,$captions); my $to_captions = $rd3->stringify; my $rd4 = dir(@a,$ts,$ruscaptions); my $rus_captions = $rd4->stringify; my $rd5 = dir(@a,$ts,$bom); my $bom_dir = $rd5->stringify; # page params my %vars = ( title => $title, headline => $headline, place => 'Oakland', css_file => "${title}1.css", header => file($ts,"hc_input2.txt"), footer => file($ts,"footer_center2.txt"), css_local => file($ts,"${title}1.css"), body => file($ts,"rebus4.tmpl"), print_script => "1", code_tmpl=> file($ts,"code1.tmpl"), oitop=> file($ts,"oitop.txt"), oibottom=> file($ts,"oibottom.txt"), to_images => $to_images, eng_captions => $to_captions, rus_captions => $rus_captions, bottom => file($ts,"bottom1.txt"), words => file($bom_dir, "words1.txt"), subs => file($bom_dir, "substitutions1.txt"), source => file($bom_dir, "nephi1.txt"), chapter => 'Lemuel', ); #create html page my $rvars = \%vars; my $rftp = get_ftp_object(); my $html_file = get_html_filename($rftp); my $fh = create_html_file ($html_file); my $remote_dir = $html_file; $remote_dir =~ s/\.html$//; say "remote_dir is $remote_dir"; $vars{remote_dir}= $remote_dir; # create header my $rhdr = write_header($rvars); print $fh $$rhdr; # print content to file #say "arg1 is $vars{'words'}"; #say "arg2 is $vars{'subs'}"; my $word_hash_ref = hashify_words($vars{'words'},$vars{'subs'}); say "in main"; my %hash = %$word_hash_ref; #my $re = keys_to_regex(%hash); #print "match\n" if $name =~ /^$re$/; # main control my $check = join '|', keys %hash; open(my $hh, "<:encoding(UTF-8)", $vars{'source'}) || die "can't open UTF-8 encoded filename: $!"; while(<$hh>){ chomp; say "default is $_"; $_ =~ s/($check)/$hash{$1}/gi; say "$_"; } __END__
$ perl mango1.pl title is mango Give me a headline: perlmonks old num is 0 remote_dir is mango1 file1 is /home/fred/Desktop/root3/pages/mango/template_stuff/bom/words +1.txt file2 is /home/fred/Desktop/root3/pages/mango/template_stuff/bom/subst +itutions1.txt keys are Nephi Lehi Jews Zedekiah Jerusalem Judah values are Rory Calhoun Mr. Burns the Homeland the Pentagon Washington + the USA ----- subroutine says this is your hash: key: Zedekiah , value: the Pentagon key: Judah, value: the USA key: Jerusalem , value: Washington key: Lehi , value: Mr. Burns key: Jews , value: the Homeland key: Nephi , value: Rory Calhoun

From about here on up, so good so far. Without boring you with the rest of the output, it's simply very patchy in weird ways. I think I need to take a step back to the hashify_words subroutine, which is pretty hot off the press for me and needs some protections against bad things happening, for example creating an extra hash entry from extraneous whitespace. Right now that looks like this:

sub hashify_words { use strict; use warnings; use 5.010; use File::Slurp; use List::MoreUtils qw( zip ); my ($file1, $file2) = @_; say "file1 is $file1"; say "file2 is $file2"; my @file1lines = read_file($file1); my @file2lines = read_file($file2); chomp(@file1lines); chomp(@file2lines); say "keys are @file1lines"; say "values are @file2lines"; my %hash = zip @file1lines, @file2lines; say "-----"; print_hash(\%hash); return \%hash; }

Again, the actual output is spotty, with some words not being spaced right. Happy to hear any constructive comment, SD

Replies are listed 'Best First'.
Re: hash substitution in regex
by AppleFritter (Vicar) on Aug 16, 2014 at 21:32 UTC

    Just a general comment:

    I use perl to generate html pages and content within them, with a templating system that I have been doggedly improving over the course of events.

    I'm just curious: why? To improve your Perl, or because you actually need a templating system? If it's the former, that's a worthwhile and commendable endeavor, but if it's the latter, I'd recommend against reinventing the wheel; there's various powerful and/or mature templating systems available, or so I'm told.

      There's a lot to say here. The fact is, I'm still learning, and part of the learning process is re-working what you have. There isn't anything broken about this routine, for example, but writing the header to an html page is not the fancy part of this evolving toolset. As I'm now working through the Alpaca book, I want to try fancier things with language than what I can find here. Furthermore, I don't understand something until I've re-written it a few times....

      sub write_header { use strict; use Text::Template; my $rvars = shift; my %vars = %$rvars; # get time my $now_string = localtime; $vars{"date"} = $now_string; my $header = $vars{"header"}; my $template2 = Text::Template->new(SOURCE => $header) or die "Couldn't construct template: $!"; my $result2 = $template2->fill_in(HASH => \%vars); return \$result2; }

      I also wrote a utility script that clones the template and imports data, so that I'm not betting the whole house every time a few lines change here or there. Always curious to hear what works for other people.

Re: hash substitution in regex
by Anonymous Monk on Aug 16, 2014 at 20:55 UTC

    If your only question / problem is whitespace in @file1lines and @file2lines, have a look at the following substitution and grep:

    my @x = (" foo "," bar","quz ","x y"," a b c "," "," ",""); s/^\s+|\s+$//g for @x; # trim @x = grep {$_} @x; # drop empty entries print "<$_>\n" for @x; __END__ <foo> <bar> <quz> <x y> <a b c>
      @x = grep {$_} @x;      # drop empty entries

      This will also drop the entry  '0' (with/without leading/trailing whitespace). Better to grep on length:

      c:\@Work\Perl>perl -wMstrict -le "my @x = ( ' foo ', ' bar', 'quz ', 'x y', ' a b c ', ' ', ' ', '', '0', ' 0', '0 ', ' 0 ', ); ;; s/^\s+|\s+$//g for @x; @x = grep length, @x; printf qq{<$_> } for @x; " <foo> <bar> <quz> <x y> <a b c> <0> <0> <0> <0>

        Alright, great, thanks for responses...I think I've got something that works now. I split hashify_word into 2 routines and like it better. The list gets read and trimmed correctly, and that seeemed to solve the problem that I had with the regex.

        sub get_list{ use strict; use warnings; use 5.010; use File::Slurp; my $file = shift; my @lines = read_file($file); chomp(@lines); s/^\s+|\s+$//g for @lines; @lines = grep length, @lines; return @lines; } sub zip_lists { use strict; use warnings; use 5.010; use List::MoreUtils qw( zip ); my ($file1, $file2) = @_; my @file1lines = get_list($file1); my @file2lines = get_list($file2); say "keys are @file1lines"; say "values are @file2lines"; my %hash = zip @file1lines, @file2lines; return \%hash; }

        My filesizes are not large for the purpose at hand, but I wonder what changes one would consider if the lists were long and you were worried about performance. Caller looks like this now:

        my $word_hash_ref = zip_lists($vars{'words'},$vars{'subs'}); say "in main"; my %hash = %$word_hash_ref; # main control my $check = join '|', keys %hash; binmode STDOUT, ":utf8"; open(my $hh, "<:encoding(UTF-8)", $vars{'source'}) || die "can't open UTF-8 encoded filename: $!"; while(<$hh>){ chomp; $_ =~ s/($check)/$hash{$1}/gi; say "$_"; }

        Again, I don't want to bore you with the actual output, but I think this works now, and again thanks....