Re: large hash of regex substitution strings

(Please put your code in <c>...</c> tags. It'll handle escaping the necessary character and it'll place the line breaks for you.)

There's a lot of needless work here. Perl code and regexs are being parsed and compiled over and over again. Also, there's no reason to use /o anymore. It does nothing more than complicate things.

my %chartran = (
   "\xE1" => 'aacute',
   "\xE2" => 'acirc',
   "\xE4" => 'auml',
   "\xC0" => 'Agrave',
   "\xC1" => 'Aacute',
);

my $re = '[' . (join '', keys %chartran) . ']';
$re = qr/$re/;

while (<VRUN>) {
    s/($re)/$chartran{$1}/g;
    print;
}
[download]

Of course, you could simply use core module HTML::Entities's encode_entities method.

use HTML::Entities qw( encode_entities );

while (<VRUN>) {
    print encode_entities($_);
}
[download]

Comment on Re: large hash of regex substitution strings Select or Download Code

Replies are listed 'Best First'.
Re^2: large hash of regex substitution strings by throop (Chaplain) on Oct 06, 2007 at 04:18 UTC
Hmmmph. I hadn't looked at HTML::Entities before. I'm already used to using CGI (or CGI::Pretty) and its `encodeHTML` function, which seems to do pretty much the same thing – (Take a string and substitute escaped HTML for the nonstandard characters.) Is there an advantage to using HTML::Entities? Or is it just that it's a smaller standalone module? throop	[reply]
Re^3: large hash of regex substitution strings by ikegami (Patriarch) on Oct 06, 2007 at 04:54 UTC
I never looked at CGI's `escapeHTML`, so I took a peek. `escapeHTML`/`unescapeHTML` only converts a few characters. That means you you can't place unicode characters in an iso-latin-1 document, only iso-latin-1 characters. That means any but a few entities won't be understood. For example, it's unable to unescape `é`, even if it maps to a character in the specified character set. HTML::Entities is familiar with all entities. HTML::Entities can numerically encode any range of characters. HTML::Entities can decode any range of characters. `escapeHTML` has some workarounds for browser issues and for `"` being accidentally omitted from HTML 3.2.	[reply] [d/l] [select]
Re^2: large hash of regex substitution strings by scodes (Initiate) on Oct 07, 2007 at 04:47 UTC
Thanks. What about the following example ? `s/^\s+\(Text [aA].* (\d+:\d+ .$)/\.T1 "$1/g;` [download] I ask as I have about 50 regexs to work with. I could build this out like this: `%search ( 1 => "s/^\s+\(Text [aA]. (\d+:\d+ .*$)/", 2 => ..... ); $replace ( 1 => "/\.T1 \"$1/", 2 => ..... ); Now I'd like to do something like this, and I know qr// fits in to the equation, I just dont know how .... yet :) while <VRUN> { s/$search/$replace/g; }` [download] Thanks for taking a further look at this. Thanks again.	[reply] [d/l] [select]
Re^3: large hash of regex substitution strings by ikegami (Patriarch) on Oct 19, 2007 at 20:42 UTC
Is there a pattern between the different operations? If not, you might be stuck with `my @ops = ( sub { s/^\s+\(Text [aA].* (\d+:\d+ .*$)/\.T1 "$1/g; }, ... ); while (<$fh>) { foreach my $op (@ops) { $op->(); } }` [download] The reason it can't be simplified much is the `$1` in the replace expression. Often, when reaching this point, it's time to look into a templating system. It's hard to tell if that's the case here since I'm only getting a very small picture of what you are doing.	[reply] [d/l] [select]
Re^3: large hash of regex substitution strings by Anonymous Monk on Oct 07, 2007 at 23:58 UTC
Further to the above: `%search ( R1 => "/^\s+\(Text [aA].* (\d+:\d+ .*$)/", R2 => ..... ); $replace ( R1 => "/\.T1 \"$1/", R2 => ..... ); while <VRUN> { foreach $rule ( keys %search ) { s/$search{$rule}/$replace{$rule}/g; } }` [download] My only concern is the substring match in R1/S1. Can I use qr// to make this more efficient, and if so, what is the correct syntax. Would qr// be required on both sides of the s// ? Do i need to use an eval or an /ee modifier to get the substitution to happen ? Thanks again all you p'gurus for your help on this :)	[reply] [d/l]