in reply to help needed in file encoding

You could check if the first few bytes contains a BOM.

if (substr($text, 0, 4) eq "\x00\x00\xFE\xFF") { utf32be } elsif (substr($text, 0, 4) eq "\xFF\xFE\x00\x00") { utf32le } elsif (substr($text, 0, 2) eq "\xFE\xFF" ) { utf16be } elsif (substr($text, 0, 2) eq "\xFF\xFE" ) { utf16le } elsif (substr($text, 0, 3) eq "\xEF\xBB\xBF" ) { utf8 } else { No BOM found. Might not be UTF. Use another method to guess the encoding. }

It's not very reliable. The protocol which encases your stream/data must really specify the encoding for things to work smoothly.

Replies are listed 'Best First'.
Re^2: help needed in file encoding
by duckyd (Hermit) on Mar 20, 2006 at 18:19 UTC
    Consider File::BOM if you need to do something like this.
      i tried with utf8::is_utf8($string) ,the $string is got from the big5 encoded file,i got the output as 1(true). how does it happen . Is all internal representation is utf8 in windows?.