in reply to large hash of regex substitution strings

(Please put your code in <c>...</c> tags. It'll handle escaping the necessary character and it'll place the line breaks for you.)

There's a lot of needless work here. Perl code and regexs are being parsed and compiled over and over again. Also, there's no reason to use /o anymore. It does nothing more than complicate things.

my %chartran = ( "\xE1" => 'aacute', "\xE2" => 'acirc', "\xE4" => 'auml', "\xC0" => 'Agrave', "\xC1" => 'Aacute', ); my $re = '[' . (join '', keys %chartran) . ']'; $re = qr/$re/; while (<VRUN>) { s/($re)/$chartran{$1}/g; print; }

Of course, you could simply use core module HTML::Entities's encode_entities method.

use HTML::Entities qw( encode_entities ); while (<VRUN>) { print encode_entities($_); }

Replies are listed 'Best First'.
Re^2: large hash of regex substitution strings
by throop (Chaplain) on Oct 06, 2007 at 04:18 UTC
    Hmmmph. I hadn't looked at HTML::Entities before. I'm already used to using CGI (or CGI::Pretty) and its encodeHTML function, which seems to do pretty much the same thing – (Take a string and substitute escaped HTML for the nonstandard characters.) Is there an advantage to using HTML::Entities? Or is it just that it's a smaller standalone module?

    throop

      I never looked at CGI's escapeHTML, so I took a peek.

      escapeHTML/unescapeHTML only converts a few characters.
      That means you you can't place unicode characters in an iso-latin-1 document, only iso-latin-1 characters.
      That means any but a few entities won't be understood. For example, it's unable to unescape &eacute;, even if it maps to a character in the specified character set.

      HTML::Entities is familiar with all entities.
      HTML::Entities can numerically encode any range of characters.
      HTML::Entities can decode any range of characters.

      escapeHTML has some workarounds for browser issues and for &quot; being accidentally omitted from HTML 3.2.

Re^2: large hash of regex substitution strings
by scodes (Initiate) on Oct 07, 2007 at 04:47 UTC
    Thanks. What about the following example ?
    s/^\s+\(Text [aA].* (\d+:\d+ .*$)/\.T1 "$1/g;
    I ask as I have about 50 regexs to work with.
    I could build this out like this:
    %search ( 1 => "s/^\s+\(Text [aA].* (\d+:\d+ .*$)/", 2 => ..... ); $replace ( 1 => "/\.T1 \"$1/", 2 => ..... ); Now I'd like to do something like this, and I know qr// fits in to the equation, I just dont know how .... yet :) while <VRUN> { s/$search/$replace/g; }
    Thanks for taking a further look at this. Thanks again.

      Is there a pattern between the different operations? If not, you might be stuck with

      my @ops = ( sub { s/^\s+\(Text [aA].* (\d+:\d+ .*$)/\.T1 "$1/g; }, ... ); while (<$fh>) { foreach my $op (@ops) { $op->(); } }

      The reason it can't be simplified much is the $1 in the replace expression. Often, when reaching this point, it's time to look into a templating system. It's hard to tell if that's the case here since I'm only getting a very small picture of what you are doing.

      Further to the above:
      %search ( R1 => "/^\s+\(Text [aA].* (\d+:\d+ .*$)/", R2 => ..... ); $replace ( R1 => "/\.T1 \"$1/", R2 => ..... ); while <VRUN> { foreach $rule ( keys %search ) { s/$search{$rule}/$replace{$rule}/g; } }
      My only concern is the substring match in R1/S1. Can I use qr// to make this more efficient, and if so, what is the correct syntax. Would qr// be required on both sides of the s// ? Do i need to use an eval or an /ee modifier to get the substitution to happen ? Thanks again all you p'gurus for your help on this :)