I am using antiword to parse a word document:
open ("ANTIWORD", "-|", "/usr/local/bin/antiword", "-mUTF-8.txt", "$filename") or die "Couldn't fork: $!\n";Unfortunately, certain characters come up as "<E2><80><99>" for ' when I pipe the output to more. If I just output it, I get " '". I obviously don't want "<E2><80><99>" or " '" (I don't want the extra space) to appear in my database so I'd like to substitute it for '. I tried the following
$line =~ s/<E2><80><99>/'/g;
But it doesn't work. Also if I redirect the output to a file, instead of "<E2><80><99>" I get "?~@~Y" but
$line =~ s/\?~@~Y/'/g;Doesn't work either... any ideas as to what is causing it and how I can fix it?
In reply to weird character problems by MCS
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |