Amphiaraus has asked for the wisdom of the Perl Monks concerning the following question:

Is there Perl code that can successfully interpret an input file having special characters from multiple European languages? My input file is an Excel spreadsheet in which each column contains data from a different European language.

My input file has multiple West European and East Eurpean languages using Latin characters ONLY, none of them use Cyrillic characters.

Example: What if the input file had the special characters listed below?

Polish ñ (Perlmonks site is not rendering this char it should be a n with a tick mark that slants upward and to the right.
Czech Š
German ä
French ë é
Spanish ó í ñ

After reading my input file, I need to write out selected parts of it to an ANSII *.ini file.

Can anyone provide Perl code that correctly sets up the input file pointers and output file pointers (described above) correctly?

The input file pointer would be associated with an Excel file containing special characters in several European languages that use Latin (not Cyrillic) characters.

The output file pointer would be associated with an *.ini file that must be in ASCII format

  • Comment on Interpreting input file having special characters from multiple European languages

Replies are listed 'Best First'.
Re: Interpreting input file having special characters from multiple European languages
by tangent (Parson) on Sep 15, 2015 at 21:39 UTC
    For reading in your Excel file, Spreadsheet::ParseExcel should handle all the characters properly. You can then use Text::Unidecode to convert your data to plain ASCII before printing to your output file - no special file handling required.
      My Perl program for reading the multi-language Excel spreadsheet, and writing selected data to an ASCII *.ini file, uses the modules Spreadsheet::BasicRead and Spreadsheet::BasicReadNamedCol to read the Excel spreadsheet.

      The unmodified Text::Unicode module is not correctly translating Polish special characters while writing them into the ASCII *.ini file, the Polish special characters are garbled.

      I am using Strawberry Perl, which I assume is very out of date compared with standard Perl. Could this be causing my problem?

      Is there an alternate solution for the problems I am having in my Perl program?
        I am using Strawberry Perl, which I assume is very out of date compared with standard Perl.

        The Strawberry Perl project is very good at keeping up with releases from the main Perl project. The incantation perl -v will tell you the specific version you are running.

        The unmodified Text::Unicode module...
        May be a typo, but my suggestion was to use Text::Unidecode not Text::Unicode - they do quite different things. Spreadsheet::BasicRead should be fine as it uses Spreadsheet::ParseExcel under the hood.