Re: hash substitution in regex

Replies are listed 'Best First'.
Re^2: hash substitution in regex by AnomalousMonk (Archbishop) on Aug 16, 2014 at 22:21 UTC
`@x = grep {$_} @x; # drop empty entries` This will also drop the entry `'0'` (with/without leading/trailing whitespace). Better to grep on length: `c:\@Work\Perl>perl -wMstrict -le "my @x = ( ' foo ', ' bar', 'quz ', 'x y', ' a b c ', ' ', ' ', '', '0', ' 0', '0 ', ' 0 ', ); ;; s/^\s+\|\s+$//g for @x; @x = grep length, @x; printf qq{<$_> } for @x; " <foo> <bar> <quz> <x y> <a b c> <0> <0> <0> <0>` [download]	[reply] [d/l] [select]
Re^3: hash substitution in regex by Aldebaran (Curate) on Aug 18, 2014 at 04:29 UTC
Alright, great, thanks for responses...I think I've got something that works now. I split hashify_word into 2 routines and like it better. The list gets read and trimmed correctly, and that seeemed to solve the problem that I had with the regex. `sub get_list{ use strict; use warnings; use 5.010; use File::Slurp; my $file = shift; my @lines = read_file($file); chomp(@lines); s/^\s+\|\s+$//g for @lines; @lines = grep length, @lines; return @lines; } sub zip_lists { use strict; use warnings; use 5.010; use List::MoreUtils qw( zip ); my ($file1, $file2) = @_; my @file1lines = get_list($file1); my @file2lines = get_list($file2); say "keys are @file1lines"; say "values are @file2lines"; my %hash = zip @file1lines, @file2lines; return \%hash; }` [download] My filesizes are not large for the purpose at hand, but I wonder what changes one would consider if the lists were long and you were worried about performance. Caller looks like this now: `my $word_hash_ref = zip_lists($vars{'words'},$vars{'subs'}); say "in main"; my %hash = %$word_hash_ref; # main control my $check = join '\|', keys %hash; binmode STDOUT, ":utf8"; open(my $hh, "<:encoding(UTF-8)", $vars{'source'}) \|\| die "can't open UTF-8 encoded filename: $!"; while(<$hh>){ chomp; $_ =~ s/($check)/$hash{$1}/gi; say "$_"; }` [download] Again, I don't want to bore you with the actual output, but I think this works now, and again thanks....	[reply] [d/l] [select]
Re^4: hash substitution in regex by Anonymous Monk on Aug 18, 2014 at 10:13 UTC
I wonder what changes one would consider if the lists were long and you were worried about performance. Hashes are relatively fast; if you're worried about loading the whole thing into memory, a database (such as DBD::SQLite) or maybe DBM file come to mind. But first, how large are the files? How much memory and CPU usage is "too much"? If you're not approaching those limits don't worry just yet :-) (By the way, while it doesn't really hurt anything except readability, you don't need to declare `use strict; use warnings; etc.` at the top of every sub if you've already declared it at the top of the file.)	[reply] [d/l]
Re^5: hash substitution in regex by Aldebaran (Curate) on Aug 21, 2014 at 19:38 UTC