The are being replaced with the appropriate unicode character. You're seeing it as a pair or "random" characters because you're trying to view UTF-8 output as another character set.
The fix is to find out which character the character, then use it to replace the character with a space. Manually find the position of a non-breaking space in a string and display its character number using:
printf("nbsp is \\x{%04X}\n", ord(substr($string, $pos, 1)));
Then you'll know what to use instead of \x{1234} in
s/\x{1234}/ /g;
in order to replace non-breaking spaces with normal spaces.
In reply to Re: Should I use; Html Parser, table extract, Extractor
by ikegami
in thread Should I use; Html Parser, table extract, Extractor
by a_non_moose
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |