Firstly, just try to export the files in text format in such a way that you can tell the character set to office. IF that fails, just export as text in whatever Windows-specific character set it likes, and piconv that text to a "normal" character set, such as iso_8859_1, iso_8859_2, utf8, utf16, or whatever you like.

If, however, you really want a quick-and-dirty solution, and convert to ascii, here's some substitutions. This is assuming that your incoming data is cp1250. Also I omit those characters that are the same in 8859_1 and cp1250, as I think you're not angry with those. So, the only characters here are those that are different in 8859_2 and 8859_1, and the windows extensions.

s/\x80/EUR/g; s/\x82/,/g; s/\x84/,,/g; s/\x85/.../g; s/\x86/\/\/\-/g; s/\x87/\/\/\=/g; s/\x89/\%0/g; s/\x8a/S\</g; s/\x8b/\</g; s/\x8c/S\'/g; s/\x8d/T\</g; s/\x8e/Z\</g; s/\x8f/Z\'/g; s/\x91/`/g; s/\x92/'/g; s/\x93/``/g; s/\x94/''/g; s/\x95/o/g; s/\x96/--/g; s/\x97/---/g; s/\x99/TM/g; s/\x9a/s\</g; s/\x9b/>/g; s/\x9c/s\'/g; s/\x9d/t\</g; s/\x9e/z\</g; s/\x9f/z\'/g; s/\xa1/\'\</g; s/\xa2/\'\(/g;
s/\xa3/L\/\//g; s/\xa5/A\;/g; s/\xaa/S\,/g; s/\xaf/Z\./g; s/\xb2/\'\;/g; s/\xb3/l\/\//g; s/\xb9/a\;/g; s/\xba/s\,/g; s/\xbc/L\</g; s/\xbd/\'\"/g; s/\xbe/l\</g; s/\xbf/z\./g; s/\xc0/R\'/g; s/\xc3/A\(/g; s/\xc5/L\'/g; s/\xc6/C\'/g; s/\xc8/C\</g; s/\xca/E\;/g; s/\xcc/E\</g; s/\xcf/D\</g; s/\xd0/D\/\//g; s/\xd1/N\'/g; s/\xd2/N\</g; s/\xd5/O\"/g; s/\xd8/R\</g; s/\xd9/U0/g; s/\xdb/U\"/g; s/\xde/T\,/g; s/\xe0/r\'/g; s/\xe3/a\(/g; s/\xe5/l\'/g; s/\xe6/c\'/g; s/\xe8/c\</g; s/\xea/e\;/g; s/\xec/e\</g; s/\xef/d\</g; s/\xf0/d\/\//g; s/\xf1/n\'/g; s/\xf2/n\</g; s/\xf5/o\"/g; s/\xf8/r\</g; s/\xf9/u0/g; s/\xfb/u\"/g; s/\xfe/t\,/g; s/\xff/\'\./g;

Update: if you want better ascii equivalents, you might be able to generate them from the files in the Unicode directory of links (the browser).

Update: readmored some of the code


In reply to Re: Mass regsub on High-bit chars. by ambrus
in thread Mass regsub on High-bit chars. by abaxaba

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.