Re: unknown character in between text

If you wanted to replace the offending character with its HTML entity, without using modules, you could maybe do something like this, which should work in many cases, (or use the commented line instead to replace the character with a space):

for (my $x=0;$x<length($string);$x++)
{
   if (ord(substr($string,$x,1))>127)
   {
   substr($string,$x,1)='&#'.ord(substr($string,$x,1)).';';

   # substr($string,$x,1)=' ';  # or use this line instead
   }
}
[download]

... or, if you just wanted to know what a character is meant to be, then you could do something like this:

for (my $x=0;$x<length($string);$x++)
{
  print ord(substr($string,$x,1)),"\t",substr($string,$x,1),"\n";
}
[download]

Hope that helps, although all the modules and tools mentioned above are useful methods too. (and I'm sure some guru could likely condense the code above into a single line).

Comment on Re: unknown character in between text Select or Download Code

Replies are listed 'Best First'.
Re^2: unknown character in between text by Anonymous Monk on Sep 17, 2011 at 13:00 UTC
eeew :p `s/([^\000-\200])/'&#'.ord($1).';'/ge s/([^\000-\200])/sprintf '&#x%X;', ord $1/ge` [download]	[reply] [d/l]
Re^3: unknown character in between text by DanielSpaniel (Scribe) on Sep 17, 2011 at 13:05 UTC
:-) That looks tidier! Haven't tested it, but I get the gist ... I knew someone would be able to condense into a handful of bytes! But why 200 in the character class? I'm sure there must be a good reason, but I think I am more familiar with just seeing \000-\177	[reply]
Re^4: unknown character in between text by Anonymous Monk on Sep 17, 2011 at 13:54 UTC
Because octal is octal :) `$ perl -le " printf qq{%3s %3o\n}, $_, $_ for @ARGV " 0 1 8 16 32 64 1 +28 0 0 1 1 8 10 16 20 32 40 64 100 128 200` [download] octal: perlrebackslash, oct, sprintf, perldata	[reply] [d/l]
Re^5: unknown character in between text by Anonymous Monk on Sep 17, 2011 at 13:56 UTC
Re^6: unknown character in between text by DanielSpaniel (Scribe) on Sep 17, 2011 at 14:20 UTC