in reply to hash substitution in regex

If your only question / problem is whitespace in @file1lines and @file2lines, have a look at the following substitution and grep:

my @x = (" foo "," bar","quz ","x y"," a b c "," "," ",""); s/^\s+|\s+$//g for @x; # trim @x = grep {$_} @x; # drop empty entries print "<$_>\n" for @x; __END__ <foo> <bar> <quz> <x y> <a b c>

Replies are listed 'Best First'.
Re^2: hash substitution in regex
by AnomalousMonk (Archbishop) on Aug 16, 2014 at 22:21 UTC
    @x = grep {$_} @x;      # drop empty entries

    This will also drop the entry  '0' (with/without leading/trailing whitespace). Better to grep on length:

    c:\@Work\Perl>perl -wMstrict -le "my @x = ( ' foo ', ' bar', 'quz ', 'x y', ' a b c ', ' ', ' ', '', '0', ' 0', '0 ', ' 0 ', ); ;; s/^\s+|\s+$//g for @x; @x = grep length, @x; printf qq{<$_> } for @x; " <foo> <bar> <quz> <x y> <a b c> <0> <0> <0> <0>

      Alright, great, thanks for responses...I think I've got something that works now. I split hashify_word into 2 routines and like it better. The list gets read and trimmed correctly, and that seeemed to solve the problem that I had with the regex.

      sub get_list{ use strict; use warnings; use 5.010; use File::Slurp; my $file = shift; my @lines = read_file($file); chomp(@lines); s/^\s+|\s+$//g for @lines; @lines = grep length, @lines; return @lines; } sub zip_lists { use strict; use warnings; use 5.010; use List::MoreUtils qw( zip ); my ($file1, $file2) = @_; my @file1lines = get_list($file1); my @file2lines = get_list($file2); say "keys are @file1lines"; say "values are @file2lines"; my %hash = zip @file1lines, @file2lines; return \%hash; }

      My filesizes are not large for the purpose at hand, but I wonder what changes one would consider if the lists were long and you were worried about performance. Caller looks like this now:

      my $word_hash_ref = zip_lists($vars{'words'},$vars{'subs'}); say "in main"; my %hash = %$word_hash_ref; # main control my $check = join '|', keys %hash; binmode STDOUT, ":utf8"; open(my $hh, "<:encoding(UTF-8)", $vars{'source'}) || die "can't open UTF-8 encoded filename: $!"; while(<$hh>){ chomp; $_ =~ s/($check)/$hash{$1}/gi; say "$_"; }

      Again, I don't want to bore you with the actual output, but I think this works now, and again thanks....

        I wonder what changes one would consider if the lists were long and you were worried about performance.

        Hashes are relatively fast; if you're worried about loading the whole thing into memory, a database (such as DBD::SQLite) or maybe DBM file come to mind.

        But first, how large are the files? How much memory and CPU usage is "too much"? If you're not approaching those limits don't worry just yet :-)

        (By the way, while it doesn't really hurt anything except readability, you don't need to declare use strict; use warnings; etc. at the top of every sub if you've already declared it at the top of the file.)