MCS has asked for the wisdom of the Perl Monks concerning the following question:
I am using antiword to parse a word document:
open ("ANTIWORD", "-|", "/usr/local/bin/antiword", "-mUTF-8.txt", "$filename") or die "Couldn't fork: $!\n";Unfortunately, certain characters come up as "<E2><80><99>" for ' when I pipe the output to more. If I just output it, I get " '". I obviously don't want "<E2><80><99>" or " '" (I don't want the extra space) to appear in my database so I'd like to substitute it for '. I tried the following
$line =~ s/<E2><80><99>/'/g;
But it doesn't work. Also if I redirect the output to a file, instead of "<E2><80><99>" I get "?~@~Y" but
$line =~ s/\?~@~Y/'/g;Doesn't work either... any ideas as to what is causing it and how I can fix it?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: weird character problems
by graff (Chancellor) on Jan 29, 2004 at 05:38 UTC | |
by PodMaster (Abbot) on Jan 29, 2004 at 09:45 UTC | |
by John M. Dlugosz (Monsignor) on Jan 29, 2004 at 21:05 UTC | |
by MCS (Monk) on Jan 29, 2004 at 18:58 UTC | |
|
Re: weird character problems
by Roger (Parson) on Jan 29, 2004 at 04:57 UTC | |
by MCS (Monk) on Jan 29, 2004 at 18:48 UTC |