in reply to Re: What's the best way to detect character encodings, Windows-1252 v. UTF-8?
in thread What's the best way to detect character encodings, Windows-1252 v. UTF-8?

my $bytes = '...';

How do I ensure that $bytes are bytes, not characters? I'm on Microsoft Windows and the text files are in the DOS format (i.e., CR-LF newlines) In other words, what I/O layer must I use? '<:raw'?

Jim

Replies are listed 'Best First'.
Re^3: What's the best way to detect character encodings, Windows-1252 v. UTF-8?
by ikegami (Patriarch) on Jun 17, 2011 at 16:14 UTC
    open(my $fh, '<:raw:perlio', $qfn)
    and
    open(my $fh, '<', $qfn) binmode($fh);
    would do, but then you'd have to do CRLF translation.
    open(my $fh, '<', $qfn)
    will actually work and properly do the CRLF translation (unless you set some default layers somewhere) despite decoding and CRLF translation being done in the wrong order. Note that
    open(my $fh, '<:encoding(UTF-8)', $qfn)
    also decodes and does CRLF translation in the wrong order. That's why
    open(my $fh, '<:encoding(UTF-16le)', $qfn)
    doesn't work on Windows (of all places!).

      So I think you're saying I should do the simplest thing and just open the files without specifying any I/O layer. In this case, Perl will do what I want. It will slurp the bytes of the file into a variable that it understands contains bytes, not characters, and it will also do what I want it to do with newlines, which is effectively to pass them through unmolested.

      What does '<:raw:perlio' do, exactly?

      Jim

        That would the simplest.

        :raw removes all layers, not just :crlf. :perlio is a buffering layer.