Ok.
On Windows, when you print "\n", there will be 2 characters, 0x0D 0x0A.
Verified using a binary editor.
When reading said text file under Windows, with standard text file I/O, Perl will remove the 0x0D (<CR>) character. That happens as part of the I/O layer and is invisible to the Perl user program.
I had thought that "\n" and "\n" had the same meaning whether write or read. It turns out that is NOT true.
Some Windows test code:
use warnings;
use strict;
open (my $file, '>', "testendings")
or die "unable to open testendings for write! $!";
print $file "test\n";
close $file;
print "Binary file created: 74 65 73 74 0D 0A\n";
open (my $infile, '<', "testendings")
or die "unable to open testendings for read! $!";
my $in =<$infile>;
close $infile;
print "input as read by normal string IO: $in";
print "Length in bytes of input var as read is: ".length($in)," bytes\
+n";
$in =~ s/(.)/sprintf("%02X ",ord($1))/seg;
print "$in\n";
print "note: when using Perl text read, the 0x0D was deleted!\n";
__END__
Binary file created: 74 65 73 74 0D 0A # 0D=>CR 0A=>LF
input as read by normal string IO: test
Length in bytes of input var as read is: 5 bytes
74 65 73 74 0A
note: when using text read, the 0x0D was deleted!
My explanation of why "\r\n" doesn't work is functionally correct, but incorrect in the dirty details of Why?. On Windows this will print <CR><CR><LF>. When read back via text mode, only one <CR> will be deleted. The regex fails because there is still another <CR> there and "$" is looking for a 0x0A. Correct?
| [reply] [Watch: Dir/Any] [d/l] |
On Windows, when you print "\n", there will be 2 characters, 0x0D 0x0A. ... I had thought that "\n" and "\n" had the same meaning whether write or read. It turns out that is NOT true.
As I've said several times now, to be completely accurate (which is important here), on Windows and *NIX, the Perl string "\n" means "\x0A" (I think Perl could be complied differently, but I'm not aware of any current builds that actually do this). What gets written or read depends on which PerlIO layers are in effect. This can even be changed dynamically while a handle is open using binmode, and it works the same on *NIX and Windows, except that :crlf is one of the defaults on Windows. To check which layers are in fact in effect, use my @layers = PerlIO::get_layers($handle);
On Windows this will print <CR><CR><LF>. When read back via text mode, only one <CR> will be deleted. The regex fails because there is still another <CR> there and "$" is looking for a 0x0A. Correct?
Correct, yes - I would nitpick that in the examples I showed, there is no reading/writing going on, so there is no need to think about what translations might be happening. Once a string has been read into Perl (and its contents verified with a dumper module), regexes behave the same on both platforms, which is where this subthread started.
Update: Also, note that "text mode" is somewhat misleading: technically, there is just the :crlf layer, which can either be active or not. Again, see PerlIO (and binmode).
| [reply] [Watch: Dir/Any] [d/l] [select] |
As I've said several times now, to be completely accurate (which is important here), on Windows and *NIX, the Perl string "\n" means "\x0A"
That is not completely correct.
When using Perl's default I/O layer, print "\n" will emit <LF> on Unix and <CR><LF> on Windows.
When reading a text line (Unix or Windows), the <CR> will be deleted, if it exists.
Update: When using "standard default I/O methods"
| [reply] [Watch: Dir/Any] |