merrymonk has asked for the wisdom of the Perl Monks concerning the following question:

I had some very helpful answers to my question about making a file UNIX compatible.
However, I wanted to see exactly what is in a line of text created in a text editor.
This is because I found in the readme.txt file for DOS2UNIX the following-
MS-DOS and UNIX systems use different methods to identify end-of-line information in text files.
MS-DOS, including Windows 9x/ME/NT/2000, use a carriage return/linefeed pair (CR/LF), whilst UNIX only uses the LF character.
Therefore I wrote the following loop which
1. Reads in each line of a text file
2. Uses split to get each character into an array
3. Prints out the character and ASCII value for each element in the array
For a windows text file, I expected to see at the end of each line and ASCII 10 and an ASCII 11.
All I could see is an ASCII 10.
Where am I doing wrong?
$lcou = 0; while(defined($linein = <TXTIN>)) { $lcou += 1; @chr_array = split(//, $linein); $linein_tot = scalar(@chr_array); print "\nline count <$lcou> total chars <$linein_tot> line <$linei +n>\n"; for($j = 0; $j < $linein_tot; $j ++) { $chr = $chr_array[$j]; $chr_num = ord($chr); print "pos <$j> ascii <$chr_num> character <$chr>\n"; } }

Replies are listed 'Best First'.
Re: Windows and UNIX end of line characters
by ikegami (Patriarch) on Aug 10, 2010 at 17:09 UTC
    Unless you tell it otherwise, Perl adds a :crlf to file handles on Windows, causing CRLF to be converted to LF automatically on read, and vice-versa on write.

    binmode($fh) will disable the :crlf layer.

Re: Windows and UNIX end of line characters
by Anonymous Monk on Aug 10, 2010 at 16:46 UTC

    Did you forget to set your filehandle to binmode?

      Yes - mainly because this is the first time I have heard of it! I will try.
      Using binmode did it!
Re: Windows and UNIX end of line characters
by oko1 (Deacon) on Aug 10, 2010 at 18:32 UTC

    To address the original problem - i.e., seeing all the characters in the file - you could always use "bvi" ("binary Vi"). If you don't have access to a system that can run "bvi", you could always just fake it with Perl:

    #!/usr/bin/perl -w use strict; die "Usage: ", $0 =~ /([^\/]+)$/, " <file>\n" unless @ARGV; open my $fh, $ARGV[0] or die "$ARGV[0]: $!\n"; binmode $fh; my ($hex, $char, $count); { my $res = sysread $fh, my $s, 1; if ($res){ $hex .= sprintf("%02X ", ord($s)); $char .= $s =~ /[[:print:]]/ ? $s : '.'; } if ((++$count % 20 == 0) || !$res){ printf "%-60s%4s%-20s\n", $hex, ' ', $char; $hex = $char = undef; } redo if $res; } close $fh;

    Sample output for Unix text file (note '0A' EOLs):

    ben@Jotunheim:/tmp$ ./pbvi unix.txt 4C 69 6E 65 20 6F 6E 65 0A 4C 69 6E 65 20 74 77 6F 0A 4C 69 Line o +ne.Line two.Li 6E 65 20 74 68 72 65 65 0A 4C 69 6E 65 20 66 6F 75 72 0A ne thr +ee.Line four.

    Sample output for Windows text file (same file, converted):

    ben@Jotunheim:/tmp$ ./pbvi windows.txt 4C 69 6E 65 20 6F 6E 65 0D 0A 4C 69 6E 65 20 74 77 6F 0D 0A Line o +ne..Line two.. 4C 69 6E 65 20 74 68 72 65 65 0D 0A 4C 69 6E 65 20 66 6F 75 Line t +hree..Line fou 72 0D 0A r..

    Update: Corrected by wrapping 'if' statement around the first two assignments in the loop; without it, the script produced a superfluous '00' at the end of each file.


    --
    "Language shapes the way we think, and determines what we can think about."
    -- B. L. Whorf
      That looks something else that is definiteley worth trying.
Re: Windows and UNIX end of line characters
by dasgar (Priest) on Aug 10, 2010 at 16:54 UTC

    I think you're looking for an incorrect ASCII code. According to the ASCII table, you'll want to be looking for ASCII 13 (carriage return) and ASCII 10 (line feed).

    Two quick questions:

    • Are you sure that the file is in DOS format and not *nix format?
    • Did you see any ASCII 13 characters in your output?
    If the answer is yes to both, then I think your code is working fine.

      I am not absolutely sorue about the format. I simply used a text editor and write some lines.
      I definietley did not see any ASCII 13 being shown.
      Now I used binmode I saw both 10 and 13 for an MSDOS file.
      Therefore you are quite correct about my searching for the worng things!