in reply to Is there a way to open a memory file with binmode :raw?

G'day stevieb,

I don't have MSWin available, but I can fake it sufficiently for this test with:

use open IO => ':crlf';

Here's four ways to do what you want (plus a fifth just to show what happens when one of those isn't used). There may, of course, be other ways I didn't think of.

#!/usr/bin/env perl -l use strict; use warnings; use autodie; use open IO => ':crlf'; my ($f, $fh); my $mf = \$f; my $test_file = 'pm_1144333_test_file.txt'; open $fh, '>', $mf; print $fh 'hello'; close $fh; print_raw_1('mem: ', $mf); print_raw_2('mem: ', $mf); print_raw_3('mem: ', $mf); print_raw_4('mem: ', $mf); print_raw_5('mem: ', $mf); open $fh, '>', $test_file; print $fh 'hello'; close $fh; print_raw_1('file: ', $test_file); print_raw_2('file: ', $test_file); print_raw_3('file: ', $test_file); print_raw_4('file: ', $test_file); print_raw_5('file: ', $test_file); sub print_raw_1 { my ($prompt, $file) = @_; open my $fh, '<:raw', $file; print '1. ', $prompt, unpack 'H*' while (<$fh>); close $fh; } sub print_raw_2 { my ($prompt, $file) = @_; open my $fh, '<', $file; binmode $fh, ':raw'; print '2. ', $prompt, unpack 'H*' while (<$fh>); close $fh; } sub print_raw_3 { my ($prompt, $file) = @_; use open IN => ':raw'; open my $fh, '<', $file; print '3. ', $prompt, unpack 'H*' while (<$fh>); close $fh; } sub print_raw_4 { my ($prompt, $file) = @_; use open IO => ':raw'; open my $fh, '<', $file; print '4. ', $prompt, unpack 'H*' while (<$fh>); close $fh; } sub print_raw_5 { my ($prompt, $file) = @_; open my $fh, '<', $file; print '5. ', $prompt, unpack 'H*' while (<$fh>); close $fh; }

Here's the output:

1. mem: 68656c6c6f0d0a 2. mem: 68656c6c6f0d0a 3. mem: 68656c6c6f0d0a 4. mem: 68656c6c6f0d0a 5. mem: 68656c6c6f0a 1. file: 68656c6c6f0d0a 2. file: 68656c6c6f0d0a 3. file: 68656c6c6f0d0a 4. file: 68656c6c6f0d0a 5. file: 68656c6c6f0a

And, if I hadn't "faked it", i.e. not including the use open IO => ':crlf'; line, I just get:

1. mem: 68656c6c6f0a 2. mem: 68656c6c6f0a 3. mem: 68656c6c6f0a 4. mem: 68656c6c6f0a 5. mem: 68656c6c6f0a 1. file: 68656c6c6f0a 2. file: 68656c6c6f0a 3. file: 68656c6c6f0a 4. file: 68656c6c6f0a 5. file: 68656c6c6f0a

See also:

and, if you're interested in internals:

— Ken

Replies are listed 'Best First'.
Re^2: Is there a way to open a memory file with binmode :raw?
by stevieb (Canon) on Oct 10, 2015 at 15:40 UTC

    This is very informative kcott, thanks :)

    However, it doesn't help me understand why on Windows, when printing to a file-based file handle, an \n is printed by default as \r\n into the file (without any binmode or IO trickery, it just does so naturally.

    However, the default record separator \r\n is not printed to a memory file based handle, it is printed only as \n. I would expect that regardless of type of handle, the default OS record separator would be used. I can't find anywhere that states this discrepancy between a real file and printing the exact same thing to a scalar reference acting as a file handle.

    That, or I'm missing something very basic.

      The problem is only with your expectations. Did you know that, in Unix, writing "\n" also becomes "\r\n", by default, just not in ordinary files. For example, it does that when writing to a TTY (having the default configuration).

      Writing to a Perl scalar is not handled by the Windows clib, obviously. So there is no requirement that such writes emulate the default behavior of Windows' clib.

      "\r\n" is the default text record separator for Windows text files.

      - tye        

      CRLF translation is feature of the Windows file systems; not the Perl language. The PerlIO layers emulate it when writing to the Windows file system.

      One fairly typical usage of memory files is to reduce IO overheads by accumulating lines together into a single scalar and then write the entire file in one go.

      If Perl applied the CRLF translation when writing to the memory file; then when the scalar is written to the file, the file system (or file system emulation) would apply the CRLF translation a second time and you would end up with \r\r\n.

      Of course that could be avoided by applying the non default :raw layer to the actual output file; or by applying binmode; but that means extra steps are required.

      Better to only apply CRLF translations when actually writing to actual file system files and then the default behaviours work together to produce the desired result.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice.

      Update: The accuracy of the information I linked to (perlport: Newlines) is in question. See tye's response to this node.

      In Perl, \n is a logical newline. It does not necessarily represent the single ASCII character whose decimal value is 10.

      Perhaps a read of "perlport: Newlines" will help clarify the situation for you.

      — Ken

        Yeah, perlport has caused more wrong conclusions than enlightenment on newlines in my experience. For example:

        In Perl, \n is a logical newline. It does not necessarily represent the single ASCII character

        In Perl, "\n" is actually always exactly one character. On an ASCII system, it is also always ASCII linefeed... except for the single case of old Macs, which took the unprecedented route of being "almost ASCII".

        "\n" is not much more a "logical" newline than "a" is a logical letter A. "a" is also always exactly one character and is also not always the ASCII lower-case letter A.

        - tye