in reply to Re: convert a string(which contains the contents of a file) into UTF-8 encoding
in thread convert a string(which contains the contents of a file) into UTF-8 encoding

Hi Ikegami, Many thanks for the reply. Could you please explain me how to use the above code. Actually i am very new to perl. Means which module i need to use for the above code to execute
  • Comment on Re^2: convert a string(which contains the contents of a file) into UTF-8 encoding

Replies are listed 'Best First'.
Re^3: convert a string(which contains the contents of a file) into UTF-8 encoding
by ikegami (Patriarch) on Sep 29, 2009 at 23:42 UTC
    No modules needed. Just put the name of the input file in $qfn_in and the name of the output file in $qfn_out
    #!/usr/bin/perl use strict; use warnings; @ARGV == 2 or die("usage: latin_to_utf8 infile outfile\n"); my ($qfn_in, $qfn_out) = @ARGV; open(my $fh_in, '<:encoding(iso-8859-1)', $qfn_in) or die("Can't open \"$qfn_in\": $!\n"); open(my $fh_out, '>:encoding(UTF-8)', $qfn_out) or die("Can't create \"$qfn_out\": $!\n"); print $fh_out $_ while <$fh_in>;
      Hi , Actually the above code is working fine in perl 5.6 and it is not able to convert the copyright and trademark signal into utf-8 in perl 5.8 . Please advice. Regards kamalakar

        While iso-latin-1 includes the Copyright symbol (©, U+00A9), it doesn't include the Trademark symbol (™, U+2122). Seeing as it's impossible to represent them in iso-latin-1, it's impossible to convert them from iso-latin-1 to UTF8. Maybe you are using Microsoft's derivative of iso-latin-1, cp1252?

        Update: I initially stated the Copyright symbol wasn't in iso-latin-1 either. Fixed.