shemp has asked for the wisdom of the Perl Monks concerning the following question:

i have some ascii data files generated by win-blows machines that i need to do some text processing on using Perl on a linux box. I have taken care of the dos \r\n problem, but i am having issue with the dos end-of-file marker. I am reading the text file line-by-line, and the final line appears to be the EOF code. If someone could tell me the ascii code for EOF, or some other way to detect that the line is EOF, that would be great.

ive tried using the Perl built-in command eof(), but it returns false, because under linux, the EOF marker is just more data, so eof() thinks that there is another line to read. So, my current test looks like this:
if ( ($input_line =~ /^\W$/) && eof(INPUT_HANDLE) ) { # this is eof }
it seems to work, but i think i dont feel comfortable that it will always work.

thanks

Replies are listed 'Best First'.
Re: dos EOF in linux
by Mr. Muskrat (Canon) on Jan 21, 2003 at 18:51 UTC
    I think that a quick demonstration is in order. On the winblows system, do the following:

    Open an MS-DOS box (command prompt) and type the following:

    copy con new.txt this is line 1 this is line 2 this is line 3 ^Z
    (where ^Z represents pressing control-z)

    Now in your favorite text editor, save the following as new.pl (or whatever you want) in the same location as new.txt:

    open(NEW,"<","new.txt"); while ( my $line = <NEW> ) { chomp $line; process_line($line); } close(NEW); sub process_line { my $line = shift; my @chars = split//,$line; print ord($chars[0]),$/; # print the character number of the first c +haracter of $line print $line,$/; }

    When you run new.pl, you will see that it never reaches the line with the EOF character on it.

Re: dos EOF in linux
by hawtin (Prior) on Jan 22, 2003 at 09:24 UTC

    The Perl functions will treat ^Z as the end of the file and refuse to read beyond it. The trick is to use binmode()

    open(INPUT,"infile.txt"); binmode(INPUT); while(<INPUT>) { ... } close(INPUT);

    Then your eof() will work. On a system (like UNIX) where there are no silly modes binmode() has no effect.

Re: dos EOF in linux
by Mr. Muskrat (Canon) on Jan 21, 2003 at 17:49 UTC
    EOF is ascii 26 (control-z). As long as you are reading the text file normally, the eof function should be sufficient. Note that it returns 1 if the next read will return an end of file or if the filehandle is not open.

    perlfunc has the following Practical hint: you almost never need to use eof in Perl, because the input operators typically return undef when they run out of data, or if there was an error.

      well, the problem is that the dos eof appears on its own line, so when im processing the lines, heres what happens:
      ... open INPUT_HANDLE .... or die ... while ( my $line = <INPUT_HANDLE> ) { chomp $line; last if my_eof_test($line, \*INPUT_HANDLE); process_line($line); } close(INPUT_HANDLE); ...
      i feel that i need this test because otherwise, the line that contains just the EOF will read properly, because there it is data, even though its the DOS EOF. because the read is true, we get in the while loop, and the line containing just EOF gets sent to process_line(). am i missing something here?
        As far as I know this isnt an issue. But im a win32 user. :-)

        And may I point out that

        while ( my $line = <INPUT_HANDLE>) {
        is not the same as (excepting the variable used)
        while ( <INPUT_HANDLE> ) {
        I believe that should read
        while ( defined ( my $line= <INPUT_HANDLE> ) ) {
        But personally I dont see the point. You shouldnt be afraid of using $_.

        --- demerphq
        my friends call me, usually because I'm late....

        It shouldn't matter but if you are worried about processing the EOF marker, try this...

        ... open INPUT_HANDLE .... or die ... while ( my $line = <INPUT_HANDLE> ) { chomp $line; process_line($line); last if eof(INPUT_HANDLE); } close(INPUT_HANDLE); ...