large hash of regex substitution strings

scodes has asked for the wisdom of the Perl Monks concerning the following question:

Hey all.
Here is a small sample of a series of regex substitutions strings:

s/^\s+\(Text [aA].* (\d+:\d+ .*$)/\.T1 "$1/g;
s/^\s+\(Text [rR].* (\d+:\d+ .*$)/\.T2 "$1/g;
[download]

Here is also a hash that is heading in the right direction ... I think ;-)

%chartran = (
agrave  => "s/\\\xE0/\\\[agrave]/og",
aacute  => "s/\\\xE1/\\\[aacute]/og",
acirc   => "s/\\\xE2/\\\[acirc]/og",
auml    => "s/\\\xE4/\\\[auml]/og",
Agrave  => "s/\\\xC0/\\\[Agrave]/og",
Aacute  => "s/\\\xC1/\\\[Aacute]/og",
);
[download]

I cycle through this hash as follows, and I know there is a simpler way to do this as well, though, here is what I have got so far:

while (<VRUN>)         {
        foreach $testchar ( keys %chartran )    {
                if (eval ( "$chartran{$testchar}" ))    {
                    ... write out results ....
                }
         }
}
[download]

So I know I could do all this really simply. Looked at qr// and like this idea, and I think the solution is an eval or an /ee modifier, but how, exactly :)

Comment on large hash of regex substitution strings Select or Download Code

Replies are listed 'Best First'.
Re: large hash of regex substitution strings by ikegami (Patriarch) on Oct 06, 2007 at 01:11 UTC
(Please put your code in `<c>...</c>` tags. It'll handle escaping the necessary character and it'll place the line breaks for you.) There's a lot of needless work here. Perl code and regexs are being parsed and compiled over and over again. Also, there's no reason to use `/o` anymore. It does nothing more than complicate things. `my %chartran = ( "\xE1" => 'aacute', "\xE2" => 'acirc', "\xE4" => 'auml', "\xC0" => 'Agrave', "\xC1" => 'Aacute', ); my $re = '[' . (join '', keys %chartran) . ']'; $re = qr/$re/; while (<VRUN>) { s/($re)/$chartran{$1}/g; print; }` [download] Of course, you could simply use core module HTML::Entities's `encode_entities` method. `use HTML::Entities qw( encode_entities ); while (<VRUN>) { print encode_entities($_); }` [download]	[reply] [d/l] [select]
Re^2: large hash of regex substitution strings by throop (Chaplain) on Oct 06, 2007 at 04:18 UTC
Hmmmph. I hadn't looked at HTML::Entities before. I'm already used to using CGI (or CGI::Pretty) and its `encodeHTML` function, which seems to do pretty much the same thing – (Take a string and substitute escaped HTML for the nonstandard characters.) Is there an advantage to using HTML::Entities? Or is it just that it's a smaller standalone module? throop	[reply]
Re^3: large hash of regex substitution strings by ikegami (Patriarch) on Oct 06, 2007 at 04:54 UTC
I never looked at CGI's `escapeHTML`, so I took a peek. `escapeHTML`/`unescapeHTML` only converts a few characters. That means you you can't place unicode characters in an iso-latin-1 document, only iso-latin-1 characters. That means any but a few entities won't be understood. For example, it's unable to unescape `é`, even if it maps to a character in the specified character set. HTML::Entities is familiar with all entities. HTML::Entities can numerically encode any range of characters. HTML::Entities can decode any range of characters. `escapeHTML` has some workarounds for browser issues and for `"` being accidentally omitted from HTML 3.2.	[reply] [d/l] [select]
Re^2: large hash of regex substitution strings by scodes (Initiate) on Oct 07, 2007 at 04:47 UTC
Thanks. What about the following example ? `s/^\s+\(Text [aA].* (\d+:\d+ .$)/\.T1 "$1/g;` [download] I ask as I have about 50 regexs to work with. I could build this out like this: `%search ( 1 => "s/^\s+\(Text [aA]. (\d+:\d+ .*$)/", 2 => ..... ); $replace ( 1 => "/\.T1 \"$1/", 2 => ..... ); Now I'd like to do something like this, and I know qr// fits in to the equation, I just dont know how .... yet :) while <VRUN> { s/$search/$replace/g; }` [download] Thanks for taking a further look at this. Thanks again.	[reply] [d/l] [select]
Re^3: large hash of regex substitution strings by ikegami (Patriarch) on Oct 19, 2007 at 20:42 UTC
Is there a pattern between the different operations? If not, you might be stuck with `my @ops = ( sub { s/^\s+\(Text [aA].* (\d+:\d+ .*$)/\.T1 "$1/g; }, ... ); while (<$fh>) { foreach my $op (@ops) { $op->(); } }` [download] The reason it can't be simplified much is the `$1` in the replace expression. Often, when reaching this point, it's time to look into a templating system. It's hard to tell if that's the case here since I'm only getting a very small picture of what you are doing.	[reply] [d/l] [select]
Re^3: large hash of regex substitution strings by Anonymous Monk on Oct 07, 2007 at 23:58 UTC
Further to the above: `%search ( R1 => "/^\s+\(Text [aA].* (\d+:\d+ .*$)/", R2 => ..... ); $replace ( R1 => "/\.T1 \"$1/", R2 => ..... ); while <VRUN> { foreach $rule ( keys %search ) { s/$search{$rule}/$replace{$rule}/g; } }` [download] My only concern is the substring match in R1/S1. Can I use qr// to make this more efficient, and if so, what is the correct syntax. Would qr// be required on both sides of the s// ? Do i need to use an eval or an /ee modifier to get the substitution to happen ? Thanks again all you p'gurus for your help on this :)	[reply] [d/l]