Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I receive Quark files from a print publication that I am to copy the text out of and make html documents with. I am using perl to automate some of the process, including copying the text. Problem is that when I copy some text it comes out strange. For instance, the words "King’s Court" appear. When i copy it, using perl, it comes out as "KingÆs Court". Why? Additionally, it does not do this if I manually copy it (using windows clipboard). Any help or tips would be appreciated.

Replies are listed 'Best First'.
Re: Fonts,fonts,fonts
by Cody Pendant (Prior) on Feb 27, 2003 at 04:45 UTC
    It's almost certianly a "smart quote" character for an apostrophe, which on Mac is ASCII code 213 and on Windows 146.

    You probably need some kind of regex-loop to fix four characters at least, single and double opening and closing quotes. If it's Mac for instance, then 210 and 211 should be replaced with " and 212 and 213 with ' -- but this is the quick fix, not the proper one.
    --

    “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.”
    M-J D
Re: Fonts,fonts,fonts
by Jaap (Curate) on Feb 26, 2003 at 23:14 UTC
    This has to do with the encoding of the document you receive and i always find playing with encodings a bit tricky.
    Try to figure out what encoding the Quark file is in (from what platform is it?) and then convert it to an encoding perl can handle (preferably Unicode encoding UTF-8).
    You might also want to read a little about perl unicode support and I18n
Re: Fonts,fonts,fonts
by CountZero (Bishop) on Feb 27, 2003 at 10:51 UTC

    Perhaps a job for Text::Iconv?

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law