in reply to Re: Deleting EOF from a file
in thread Deleting EOF from a file

How about if you update this solution by putting "<code>" at the beginning of the cpp code, and putting "</code>" at the end of it, so the rest of us can read it more easily.

IIRC, the "^Z" character (0x1A in hex, \032 in octal, 26. decimal) has always been used -- and is still used -- by MS systems as the byte value that marks the end of a text file. That is, if a file is being opened and read in text mode on an MS system, then there will be an EOF condition when a ^Z byte is encountered.

Of course, if the file is opened and read in binary mode, then ^Z has no special meaning, and will be treated the same as every other possible byte value. This is important, since many non-text files (containing image, audio, compressed, compiled executable or similar kinds of data) tend to contain bytes whose values happen to be 26. (i.e. 0x1a, 032, ^Z), and using text mode on such files will cause a premature EOF condition -- not good. (There are other evils that arise when treating non-text files with MS text-mode i/o, but I shouldn't digress...)

As for removing ^Z from a file, well... Obviously, if you do this globally on a non-text file, this is simply a form of data corruption -- whatever the original data may have been, it will be garbage after all the ^Z's are removed.

If, using an MS system, you want to do this on a real DOS/Windows text file (where there is just one ^Z, at the very end), I believe you would have to open both input and output files in "binary mode"; if you read such a file in text mode (like you're "supposed to"), the program would never see the ^Z -- the OS intercepts it on reading and appends it on writing, and the program handling files in text mode never sees this character. You can only read and write ^Z explicitly in your program when handling files in binary mode. (That's the main and traditional use of perl's "binmode" function, though now as of Perl 5.8, this function extends to cover other things as well, like character encoding.)

Replies are listed 'Best First'.
Re^3: Deleting EOF from a file
by wfsp (Abbot) on Jul 16, 2004 at 06:32 UTC
    I was intrigued by the question and had a go at trying to do it. I often find myself using a hex dump utility to 'see what's actually there'. Looking in Perl would be useful. I've not had experience with binary mode before so it was about time I did.
    (dummy.txt has 99 'a's and a 'b')
    use strict; use warnings; use Fcntl; my $stream; # 'or die ...' removed for clarity sysopen(DUMMY, "dummy.txt", O_RDWR | O_BINARY); my $bytes_read = read DUMMY, $stream, 128; for ( my $i=0; $i<= $bytes_read; $i++ ){ my $char = substr( $stream, $i, 1 ); print $i, ": ", ord( $char ), " => *", $char, "*\n"; }
    produces...
    0: 97 => *a* 1: 97 => *a* 2: 97 => *a* 3: 97 => *a* ... etc 98: 97 => *a* 99: 98 => *b* 100: 0 => **
    Nowhere near it (not even new lines). I found the docs quite intimidating, clearly the cross platform issues are tricky. (I was reading one article that mentioned CP/M!)
    Any pointers?
    activestate 5.8 on winXP
      How did you create "dummy.txt"? I haven't been keeping up with all the hidden details of recent MS Windows versions -- it may well be that the practice of appending ^Z to mark EOF on text-mode files has been abandoned.

      I happened to try creating a small text file using Notepad on an XP system just now, and I found that there was no ^Z at the end. But I do recall seeing text files created on MS systems in the mid-90's that had been transfered (using binary-mode ftp) to a unix system, and there was a ^Z at the end of each file.

        Thanks for your reply.

        ...it may well be the practice of appending ^Z to mark EOF on text-mode files has been abandoned.

        I've come to that conclusion too. I'm still rooting around on Google to learn how Windows marks EOF in text files. I now have a little hex dump script so it would be useful to know.
        Thanks again.