in reply to How do you open/read a text , without knowing its encoding, and remove any BOM if its utf, what do you use?

To remove a BOM from a file, use String::BOM.
#!/usr/bin/perl -l use strict; use warnings; use String::BOM qw(strip_bom_from_file); my $file = '/path/to/file'; print strip_bom_from_file($file);
Prints 1 on success. Uses $! on failure.
  • Comment on Re: How do you open/read a text , without knowing its encoding, and remove any BOM if its utf, what do you use?
  • Download Code

Replies are listed 'Best First'.
Re^2: How do you open/read a text , without knowing its encoding, and remove any BOM if its utf, what do you use?
by Zzenmonk (Sexton) on Apr 23, 2013 at 07:45 UTC

    Hi,

    Encode::Guess does a fine job to detect the encoding. Read its documentation carefully on CPAN. To detect the encoding you can use something like:

    open ( IN, "<", yourfile); my $bigstring = ""; my @content = <IN>; foreach my $tmp (@content) { $bigstring .= $tmp; } print "My file content encoding is: ", Encode::Guess->guess($bigstring +)->name;

    Now you can decode and encode your data in the encoding you want. You need to have a strategy as to this matter. I recomment keeping it in UTF8 or 16 depending on the case. If you face BOM issues String::BOM is a good solution.

    The following might help further: http://perldoc.perl.org/perluniintro.html

    K

    The best medicine against depression is a cold beer!
A reply falls below the community's threshold of quality. You may see it by logging in.