hash substitution in regex

Aldebaran has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks

I use perl to generate html pages and content within them, with a templating system that I have been doggedly improving over the course of events. Recent versions have included cyrillic captions, for example, and I learned a lot in simply achieving the capability.

Previous versions have used Text::Template to make text substitutions, but I wanted to try to achieve it with a handwritten version that has an appropriate control and regex, with all the flexibility of using hashes. Let me post the script before continuing...

#!/usr/bin/perl -w
use strict;
use 5.010;
use lib "template_stuff";
use football1;
use football2;
use utils1;
use Cwd;
use File::Basename; 
use Net::FTP;
use Path::Class;
use File::Slurp;
use File::Spec;

# initializations that must precede main data structure
my $ts = "template_stuff";
my $images = "aimages";
my $captions = "captions";
my $ruscaptions = "ruscaptions";
my $bom = "bom";
my $current = cwd; 
my $rd1 = dir($current);
my @a = $rd1->dir_list();
my $srd1 = $rd1->stringify;
my $title = $rd1->basename;
say "title is $title";
# get headline from stdin
say "Give me a headline:  ";
my $headline = <STDIN>;
my $rd2 = dir(@a,$ts,$images);
my $to_images = $rd2->stringify;
my $rd3 = dir(@a,$ts,$captions);
my $to_captions = $rd3->stringify;
my $rd4 = dir(@a,$ts,$ruscaptions);
my $rus_captions = $rd4->stringify;
my $rd5 = dir(@a,$ts,$bom);
my $bom_dir = $rd5->stringify;

# page params
my %vars = (
   title => $title,
   headline => $headline,
   place => 'Oakland',
   css_file => "${title}1.css",
   header => file($ts,"hc_input2.txt"),
   footer => file($ts,"footer_center2.txt"),
   css_local => file($ts,"${title}1.css"),
   body => file($ts,"rebus4.tmpl"),
   print_script => "1",
   code_tmpl=> file($ts,"code1.tmpl"), 
   oitop=> file($ts,"oitop.txt"),  
   oibottom=> file($ts,"oibottom.txt"), 
   to_images => $to_images,
   eng_captions => $to_captions,
   rus_captions =>  $rus_captions,
   bottom => file($ts,"bottom1.txt"),
   words => file($bom_dir, "words1.txt"),
   subs  => file($bom_dir, "substitutions1.txt"),
   source => file($bom_dir, "nephi1.txt"),
   chapter => 'Lemuel',
        );
#create html page
my $rvars = \%vars;
my $rftp = get_ftp_object();
my $html_file = get_html_filename($rftp);
my $fh = create_html_file ($html_file);
my $remote_dir = $html_file;
$remote_dir =~ s/\.html$//;
say "remote_dir is $remote_dir";
$vars{remote_dir}= $remote_dir;
# create header
my $rhdr = write_header($rvars);
print $fh $$rhdr;
# print content to file
#say "arg1 is $vars{'words'}";
#say "arg2 is $vars{'subs'}";
my $word_hash_ref = hashify_words($vars{'words'},$vars{'subs'});
say "in main";
my %hash = %$word_hash_ref;
#my $re = keys_to_regex(%hash);
#print "match\n" if $name =~ /^$re$/;
# main control
my $check = join '|', keys %hash;

open(my $hh, "<:encoding(UTF-8)", $vars{'source'})
    || die "can't open UTF-8 encoded filename: $!";
while(<$hh>){
chomp;
say "default is $_";
$_ =~ s/($check)/$hash{$1}/gi;
say "$_";
}
__END__
[download]

$ perl mango1.pl 
title is mango
Give me a headline:  
perlmonks
old num is 0
remote_dir is mango1
file1 is /home/fred/Desktop/root3/pages/mango/template_stuff/bom/words
+1.txt
file2 is /home/fred/Desktop/root3/pages/mango/template_stuff/bom/subst
+itutions1.txt
keys are Nephi  Lehi  Jews  Zedekiah  Jerusalem  Judah
values are Rory Calhoun Mr. Burns the Homeland the Pentagon Washington
+ the USA
-----
subroutine says this is your hash: 
key: Zedekiah , value: the Pentagon
key: Judah, value: the USA
key: Jerusalem , value: Washington
key: Lehi , value: Mr. Burns
key: Jews , value: the Homeland
key: Nephi , value: Rory Calhoun
[download]

From about here on up, so good so far. Without boring you with the rest of the output, it's simply very patchy in weird ways. I think I need to take a step back to the hashify_words subroutine, which is pretty hot off the press for me and needs some protections against bad things happening, for example creating an extra hash entry from extraneous whitespace. Right now that looks like this:

sub hashify_words  {
use strict;
use warnings;
use 5.010;
use File::Slurp;
use List::MoreUtils qw( zip );

my ($file1, $file2) = @_;
say "file1 is $file1";
say "file2 is $file2";

my @file1lines = read_file($file1);
my @file2lines = read_file($file2);
chomp(@file1lines); 
chomp(@file2lines); 
say "keys are @file1lines";
say "values are @file2lines";

my %hash = zip @file1lines, @file2lines;
say "-----";
print_hash(\%hash);
return \%hash;
}
[download]

Again, the actual output is spotty, with some words not being spaced right. Happy to hear any constructive comment, SD

Comment on hash substitution in regex Select or Download Code

Replies are listed 'Best First'.
Re: hash substitution in regex by AppleFritter (Vicar) on Aug 16, 2014 at 21:32 UTC
Just a general comment: I use perl to generate html pages and content within them, with a templating system that I have been doggedly improving over the course of events. I'm just curious: why? To improve your Perl, or because you actually need a templating system? If it's the former, that's a worthwhile and commendable endeavor, but if it's the latter, I'd recommend against reinventing the wheel; there's various powerful and/or mature templating systems available, or so I'm told.	[reply]
Re^2: hash substitution in regex by Aldebaran (Curate) on Aug 18, 2014 at 08:10 UTC
There's a lot to say here. The fact is, I'm still learning, and part of the learning process is re-working what you have. There isn't anything broken about this routine, for example, but writing the header to an html page is not the fancy part of this evolving toolset. As I'm now working through the Alpaca book, I want to try fancier things with language than what I can find here. Furthermore, I don't understand something until I've re-written it a few times.... `sub write_header { use strict; use Text::Template; my $rvars = shift; my %vars = %$rvars; # get time my $now_string = localtime; $vars{"date"} = $now_string; my $header = $vars{"header"}; my $template2 = Text::Template->new(SOURCE => $header) or die "Couldn't construct template: $!"; my $result2 = $template2->fill_in(HASH => \%vars); return \$result2; }` [download] I also wrote a utility script that clones the template and imports data, so that I'm not betting the whole house every time a few lines change here or there. Always curious to hear what works for other people.	[reply] [d/l]
Re: hash substitution in regex by Anonymous Monk on Aug 16, 2014 at 20:55 UTC
If your only question / problem is whitespace in `@file1lines` and `@file2lines`, have a look at the following substitution and grep: `my @x = (" foo "," bar","quz ","x y"," a b c "," "," ",""); s/^\s+\|\s+$//g for @x; # trim @x = grep {$_} @x; # drop empty entries print "<$_>\n" for @x; __END__ <foo> <bar> <quz> <x y> <a b c>` [download]	[reply] [d/l] [select]
Re^2: hash substitution in regex by AnomalousMonk (Archbishop) on Aug 16, 2014 at 22:21 UTC
`@x = grep {$_} @x; # drop empty entries` This will also drop the entry `'0'` (with/without leading/trailing whitespace). Better to grep on length: `c:\@Work\Perl>perl -wMstrict -le "my @x = ( ' foo ', ' bar', 'quz ', 'x y', ' a b c ', ' ', ' ', '', '0', ' 0', '0 ', ' 0 ', ); ;; s/^\s+\|\s+$//g for @x; @x = grep length, @x; printf qq{<$_> } for @x; " <foo> <bar> <quz> <x y> <a b c> <0> <0> <0> <0>` [download]	[reply] [d/l] [select]
Re^3: hash substitution in regex by Aldebaran (Curate) on Aug 18, 2014 at 04:29 UTC
Alright, great, thanks for responses...I think I've got something that works now. I split hashify_word into 2 routines and like it better. The list gets read and trimmed correctly, and that seeemed to solve the problem that I had with the regex. `sub get_list{ use strict; use warnings; use 5.010; use File::Slurp; my $file = shift; my @lines = read_file($file); chomp(@lines); s/^\s+\|\s+$//g for @lines; @lines = grep length, @lines; return @lines; } sub zip_lists { use strict; use warnings; use 5.010; use List::MoreUtils qw( zip ); my ($file1, $file2) = @_; my @file1lines = get_list($file1); my @file2lines = get_list($file2); say "keys are @file1lines"; say "values are @file2lines"; my %hash = zip @file1lines, @file2lines; return \%hash; }` [download] My filesizes are not large for the purpose at hand, but I wonder what changes one would consider if the lists were long and you were worried about performance. Caller looks like this now: `my $word_hash_ref = zip_lists($vars{'words'},$vars{'subs'}); say "in main"; my %hash = %$word_hash_ref; # main control my $check = join '\|', keys %hash; binmode STDOUT, ":utf8"; open(my $hh, "<:encoding(UTF-8)", $vars{'source'}) \|\| die "can't open UTF-8 encoded filename: $!"; while(<$hh>){ chomp; $_ =~ s/($check)/$hash{$1}/gi; say "$_"; }` [download] Again, I don't want to bore you with the actual output, but I think this works now, and again thanks....	[reply] [d/l] [select]
Re^4: hash substitution in regex by Anonymous Monk on Aug 18, 2014 at 10:13 UTC
Re^5: hash substitution in regex by Aldebaran (Curate) on Aug 21, 2014 at 19:38 UTC