in reply to Should I use; Html Parser, table extract, Extractor
The are being replaced with the appropriate unicode character. You're seeing it as a pair or "random" characters because you're trying to view UTF-8 output as another character set.
The fix is to find out which character the character, then use it to replace the character with a space. Manually find the position of a non-breaking space in a string and display its character number using:
printf("nbsp is \\x{%04X}\n", ord(substr($string, $pos, 1)));
Then you'll know what to use instead of \x{1234} in
s/\x{1234}/ /g;
in order to replace non-breaking spaces with normal spaces.
|
|---|