in reply to unknown character in between text

If you wanted to replace the offending character with its HTML entity, without using modules, you could maybe do something like this, which should work in many cases, (or use the commented line instead to replace the character with a space):

for (my $x=0;$x<length($string);$x++) { if (ord(substr($string,$x,1))>127) { substr($string,$x,1)='&#'.ord(substr($string,$x,1)).';'; # substr($string,$x,1)=' '; # or use this line instead } }

... or, if you just wanted to know what a character is meant to be, then you could do something like this:

for (my $x=0;$x<length($string);$x++) { print ord(substr($string,$x,1)),"\t",substr($string,$x,1),"\n"; }

Hope that helps, although all the modules and tools mentioned above are useful methods too. (and I'm sure some guru could likely condense the code above into a single line).

Replies are listed 'Best First'.
Re^2: unknown character in between text
by Anonymous Monk on Sep 17, 2011 at 13:00 UTC

    eeew :p

    s/([^\000-\200])/'&#'.ord($1).';'/ge s/([^\000-\200])/sprintf '&#x%X;', ord $1/ge

      :-) That looks tidier! Haven't tested it, but I get the gist ... I knew someone would be able to condense into a handful of bytes! But why 200 in the character class? I'm sure there must be a good reason, but I think I am more familiar with just seeing \000-\177