Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I had corrupted a UTF-8 file ( treated it as Windows-1252 in scite and hit save, then went and treated it as UTF-8 again )

Now reading the file as UTF I get many warnings like

utf8 "\xE1" does not map to Unicode at junk line 8, <$in> line 13642.

How do I disable these warnings? This doesn't work :)  no warnings 'utf8';

It would be nice if I could pragmatically restore this file, but its not a priority, the corrupted chars were mostly punctuation

Replies are listed 'Best First'.
Re: disable warnings: utf8 "\x8E" does not map to Unicode
by Anonymous Monk on Feb 10, 2012 at 14:46 UTC
    Undo the double encoding:
    piconv -f utf-8 -t windows-1252 < junk > junk.fixed

      Undo the double encoding

      That is the first thing I tried, among various permutations which didn't work, it wasn't exactly double-encoded , the clipboard may have been involved. I even tried fix_latin but nothing worked quite right, other than manual intervention.

      Anyway, I'm more interested in silencing the warning, surely it has to be possible

        When calling Encode::decode, pass Encode::FB_DEFAULT as third parameter.

        See chapter "Handling Malformed Data" in the Encode documentation.

Re: disable warnings: utf8 "\x8E" does not map to Unicode
by Eliya (Vicar) on Feb 10, 2012 at 18:03 UTC
    How do I disable these warnings?

    In theory, the following should work:

    use PerlIO::encoding; $PerlIO::encoding::fallback = Encode::FB_DEFAULT;

    However, when I try it, for example

    #!/usr/bin/perl use PerlIO::encoding; $PerlIO::encoding::fallback = Encode::FB_DEFAULT; open my $fh, "<:encoding(UTF-8)", $ARGV[0] or die $!; binmode STDOUT, ":encoding(UTF-8)"; while (<$fh>) { print }

    with an input file such as  (where the 'ä' is Latin-1 encoded)

    foo bär

    although it does silence the warning, I do get an endless loop!

    foo
    b�r
    foo
    b�r
    foo
    b�r
    ...
    

    Not sure why...   (Encode::FB_QUIET, OTOH, doesn't have the latter problem).

    Looks like a bug to me.  Can others confirm the problem?

      Confirmed on 5.14.2.
Re: disable warnings: utf8 "\x8E" does not map to Unicode
by ww (Archbishop) on Feb 10, 2012 at 14:51 UTC
    I can't speak to SciTE (questions about its UI are better directed to a relevant site or the Scintilla mailing list), but re your desire to restore the file: search the Monastery for threads discussing "encode," "decode" or even "code point."

    You can also probably find help in the Tutorials section (based on the inference that you asked your question here because you have some Perl background).

    Update: "inference" clause stricken because /me missed relevant input -- "no warnings 'utf8'." Since AM couldn't have updated, it's clearly my mistake. Apologies.