Re: Deleting EOF from a file
by Joost (Canon) on Jul 15, 2004 at 16:27 UTC
|
AFAIK EOF is only a character on Win32/DOS systems (well, in files that is), and can be removed like this (I had to google for the EOF charnum, but it seems to be 032):
perl -pn -i.bak -e 's/\032//g' filename
This is completely untested
| [reply] [d/l] |
Re: Deleting EOF from a file
by gellyfish (Monsignor) on Jul 15, 2004 at 16:27 UTC
|
perl -pi -e's/[\cZ\cD]//g' file
/J\ | [reply] [d/l] |
Re: Deleting EOF from a file
by matija (Priest) on Jul 15, 2004 at 17:23 UTC
|
The propper way to read a file in Perl is while (<FILE>) {
That will terminate the loop once end of the file is reached.
You can't create a file that has no EOF at all. Not even on windows. You can create a file that appears very long (like /dev/zero on unix), but you have to work at it. Don't worry, whatever file you try to read on your system, the system will let you know when you've reached the end of file. | [reply] [d/l] |
Re: Deleting EOF from a file
by zentara (Cardinal) on Jul 16, 2004 at 15:34 UTC
|
I havn't delved into this much, but I thought that the EOF was a file system thing....a null byte 0x00 which signified the end of the file data? When you open the file, you never see the EOF marker. When the filesystem goes to get the file from disk, it knows it reached the EOF when it hits a 0x00. 0x00 is never used in any program or file, and if it's encountered will trigger an error about "premature EOF" or something similar.
But thats just my understanding, I'm open to edification. :-)
If you cat 2 files together, only one EOF is placed at the end of the combined files, automatically.
I'm not really a human, but I play one on earth.
flash japh
| [reply] |
Re: Deleting EOF from a file
by nsyed (Novice) on Jul 15, 2004 at 17:06 UTC
|
Sorry for not expaling it. MY OS is Win2k Professional. This is what I am trying to do when I read a file using a loop "while != eof" then do something. But trick is the file that I am going to read will not have "eof" so I might get into an infinite loop and then I will try to handle this problem. But first I wana create a file that has no eof in it. I hope that explains it. | [reply] |
|
|
You will never find a normal file for while the while ! eof construct is an infinite loop.
Maybe you should explain your entire situation. Are you actually going to be dealing with normal files, or are these sockets or some other kind of pseudofile?
| [reply] [d/l] |
Re: Deleting EOF from a file
by nsyed (Novice) on Jul 15, 2004 at 19:07 UTC
|
Thank you all. Below is the solution(in CPP) to remove a eof from file. By the way, the idea is not make other ppl's program crash. It is make it better to recover from anything. Thanks
// DeEOF.cpp : Reads in a file and outputs the contents, omitting
// all EOF characters.
//
// Parameter: <inputfile> - the file containing the undesired EOFs.
//
//
// Written by Brendan Cunnie
//
#include "stdafx.h"
#include <string.h>
#include <fstream.h>
#include <iostream.h>
#include <ios.h>
#define ASCII_EOF 26
int main(int argc, char* argv[])
{
bool fDisplayHelp = false;
if (argc == 1) fDisplayHelp = true;
if ( (argc>1) &&
( (strcmp(argv[1],"-?")==0) || (strcmp(argv[1],"-h")=
+=0)) ) {
fDisplayHelp = true;
}
if (fDisplayHelp) {
cout << "Takes a file and outputs it, strippin
+g all EOF characters." << endl;
cout << "Usage:" << endl;
cout << " DeEOF <inputfile>" << endl;
return (0);
}
ifstream ifile ( argv[1], ifstream::binary);
char ch;
for (ch = ifile.get(); !ifile.eof(); ch = ifile.get()) {
if (ch != ASCII_EOF) {
cout << ch << flush;
}
}
return 0;
}
Edited by Chady -- code tags | [reply] [d/l] |
|
|
How about if you update this solution by putting "<code>" at the beginning of the cpp code, and putting "</code>" at the end of it, so the rest of us can read it more easily.
IIRC, the "^Z" character (0x1A in hex, \032 in octal, 26. decimal) has always been used -- and is still used -- by MS systems as the byte value that marks the end of a text file. That is, if a file is being opened and read in text mode on an MS system, then there will be an EOF condition when a ^Z byte is encountered.
Of course, if the file is opened and read in binary mode, then ^Z has no special meaning, and will be treated the same as every other possible byte value. This is important, since many non-text files (containing image, audio, compressed, compiled executable or similar kinds of data) tend to contain bytes whose values happen to be 26. (i.e. 0x1a, 032, ^Z), and using text mode on such files will cause a premature EOF condition -- not good. (There are other evils that arise when treating non-text files with MS text-mode i/o, but I shouldn't digress...)
As for removing ^Z from a file, well... Obviously, if you do this globally on a non-text file, this is simply a form of data corruption -- whatever the original data may have been, it will be garbage after all the ^Z's are removed.
If, using an MS system, you want to do this on a real DOS/Windows text file (where there is just one ^Z, at the very end), I believe you would have to open both input and output files in "binary mode"; if you read such a file in text mode (like you're "supposed to"), the program would never see the ^Z -- the OS intercepts it on reading and appends it on writing, and the program handling files in text mode never sees this character. You can only read and write ^Z explicitly in your program when handling files in binary mode. (That's the main and traditional use of perl's "binmode" function, though now as of Perl 5.8, this function extends to cover other things as well, like character encoding.)
| [reply] |
|
|
I was intrigued by the question and had a go at trying to do it. I often find myself using a hex dump utility to 'see what's actually there'. Looking in Perl would be useful. I've not had experience with binary mode before so it was about time I did.
(dummy.txt has 99 'a's and a 'b')
use strict;
use warnings;
use Fcntl;
my $stream;
# 'or die ...' removed for clarity
sysopen(DUMMY, "dummy.txt", O_RDWR | O_BINARY);
my $bytes_read = read DUMMY, $stream, 128;
for ( my $i=0; $i<= $bytes_read; $i++ ){
my $char = substr( $stream, $i, 1 );
print $i, ": ", ord( $char ), " => *", $char, "*\n";
}
produces...
0: 97 => *a*
1: 97 => *a*
2: 97 => *a*
3: 97 => *a*
... etc
98: 97 => *a*
99: 98 => *b*
100: 0 => **
Nowhere near it (not even new lines). I found the docs quite intimidating, clearly the cross platform issues are tricky. (I was reading one article that mentioned CP/M!)
Any pointers?
activestate 5.8 on winXP | [reply] [d/l] [select] |
|
|
|
|
|
|
Well that's cleared that up then. I was just starting to get interested!
update: Sorry, looks better with code tags
| [reply] |
| A reply falls below the community's threshold of quality. You may see it by logging in. |
| A reply falls below the community's threshold of quality. You may see it by logging in. |