in reply to character encoding ambiguities when performing regexps with html entities

You don't have to escape the \\ if you
don't put it into ".."'quotes (except \'), just by removing
the escapes, your code will work fine,
<workable example provided>:
use strict; use warnings; my $text=q' \Start We have $\alpha$-helical to a $\beta$-sheet proteins and stuff. The $\beta$-sheet structures relate to $\pi$ by several \degrees. '; my %allowed_text_code = ( '$\alpha$' => '&#945;', '$\beta$' => '&#946;', '$\gamma$' => '&#947;', '$\delta$' => '&#948;', '$\theta$' => '&#952;', '$\pi$' => '&#960;', '\degrees' => '&#176;' ); foreach my $tex_key (keys %allowed_text_code) { $text =~ s/\Q$tex_key\E/$allowed_text_code{$tex_key}/g; } print $text. "\n";

This would not hold if your TeX-Source don't
look like I expected ;-)

Regards
mwa
  • Comment on Re: character encoding ambiguities when performing regexps with html entities
  • Download Code

Replies are listed 'Best First'.
Re^2: character encoding ambiguities when performing regexps with html entities
by ikegami (Patriarch) on Sep 24, 2007 at 18:05 UTC

    You don't have to escape the \\ if you don't put it into ".."'quotes (except \'),

    But there's no harm in doing so. In fact, I always escape \ in single quotes to avoid accidently doing

    $path = '\\server\share'; # XXX WRONG

    Your changes produces no difference whatsoever.

      ikegami: $path = '\\server\share'; # XXX WRONG

      correct, I'd better written \\ and \' *can*
      be escaped in single quotes.
      BTW, I'd rather guess the problem the OP has
      comes entirely from UTF-x to ISO-y-z (or sth. else)
      conversion (but he did't hint to this or gave input data)

      Regards
      mwa