kamalakar has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am using the below method to convert a string(which contains the contents of a file) into UTF-8 encoding. But some of the contents like Copyright symbol, Trademark and syperscript symbol is not converting proplery and these symbols are displayed as Â. Could anyone please help me in solving this problem.
sub convertoUtf{ my $your_data = shift; my $encoding = 'ISO-8859-1'; if ($encoding ne 'UTF-8'){ my $ustr = Unicode::String->new(); if ($encoding eq 'UTF-16'){ $ustr->utf16( $your_data ); }else{ my $map = Unicode::Map->new( $encoding ); if (! $map ){ # Deal with the unsupported encoding somehow; # you probably want to have a message like this: # "Unsupported XML encoding: $encoding"); } $ustr->utf16( $map->to_unicode( $your_data ) ); } $your_data = $ustr->utf8(); } return $your_data; }
  • Comment on convert a string(which contains the contents of a file) into UTF-8 encoding
  • Download Code

Replies are listed 'Best First'.
Re: convert a string(which contains the contents of a file) into UTF-8 encoding
by ikegami (Patriarch) on Sep 26, 2009 at 01:55 UTC
    open(my $fh_in, '<:encoding(iso-8859-1)', $qfn_in) or die("Can't open \"$qfn_in\": $!\n"); open(my $fh_out, '>:encoding(UTF-8)', $qfn_out) or die("Can't create \"$qfn_out\": $!\n"); print $fh_out $_ while <$fh_in>;
      Hi Ikegami, Many thanks for the reply. Could you please explain me how to use the above code. Actually i am very new to perl. Means which module i need to use for the above code to execute
        No modules needed. Just put the name of the input file in $qfn_in and the name of the output file in $qfn_out
        #!/usr/bin/perl use strict; use warnings; @ARGV == 2 or die("usage: latin_to_utf8 infile outfile\n"); my ($qfn_in, $qfn_out) = @ARGV; open(my $fh_in, '<:encoding(iso-8859-1)', $qfn_in) or die("Can't open \"$qfn_in\": $!\n"); open(my $fh_out, '>:encoding(UTF-8)', $qfn_out) or die("Can't create \"$qfn_out\": $!\n"); print $fh_out $_ while <$fh_in>;
Re: convert a string(which contains the contents of a file) into UTF-8 encoding
by Anonymous Monk on Sep 26, 2009 at 00:34 UTC