Good thoughts in this thread on the spelling conversion; have not had good luck in the past with outsourcing this kind of thing to India. Fortunately, the idioms will not need conversion; just basic spelling (center / centre). These 1k documents are part of a larger collection of about 24k documents to be published, and the rest are in British English and punctuation and the archive I'm working with is requiring that kidn of basic consistency between the documents.

In terms of quotations, now that I've thought about it more, there seems to be only one case I need to watch out for which is balanced single quotes inside double quotes, complicated potentially by an apostrophe somewhere in the double quotes as well; otherwise, single quotes in the docs stay as they are; in the British usage balanced single quotes in balanced double quotes become balanced double quotes in balanced single quotes; of course there may be a very few cases where the nesting is deeper. I've been thinking about whether there is a clever way to do this with a few s///g rather than checking a character stream....


In reply to Re^4: Perl to convert US to UK punctuation/spelling? by freewheel
in thread Perl to convert US to UK punctuation/spelling? by freewheel

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.