Using null filehandle

digger has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed Monks,

I have run into a perplexing problem that I hope someone can explain to me. On page 82 of the camel, it states that using a null filehandle (i.e. while (<>){do something}) will evaluate the contents of @ARGV as a list of filenames, or default to STDIN if there are no values in @ARGV.

I am writing a script that takes a filename to process on the command line during testing, and takes input from STDIN in production. This seems like the pefect time to take advantage of the behavior of the null filehandle. However, upon making the changes in my code to take advantage of this, my output changed drastically. Running on my laptop, using AS Perl v5.8.0, I am short ~10 lines. Running in the production environment, Perl 5.6.0 on Solaris, I am missing ~75 lines. When I change the code back to use explicit file handles, it works like a charm.

I also wrote a little sub, as you will see in my code, that prints each line processed to the console as well as to the output file. All lines are output correctly to the console window, but not to my output file.

I am posting my code below.

#!/usr/bin/perl
use strict;
use warnings;

#since first page gets treated like the end of a chapter
#we start with end_chapter being true.
my $end_chapter = 1;

#flags
#for title
my $title_fixed = 0;
#for media type removal
my $media_killed = 0;

#time is used as filename to reduce possibility of files being overwri
+tten.
my $time = time();

#line required to redirect output to postprocessor (ie perfectbinder)
my $to_pp = "<</OutputType(postprocessor)>>setpagedevice \n";

#line to force chapterization
my $chapter = "true [] /110pProcs /ProcSet findresource /setchapters g
+et exec \n";

#file declarations
#my $outfile = "/var/spool/drop_box/autoq/$time.print";
#my $outfile = "./outputfile";
my $outfile = "c:/Quint.out";#only used for testing. uncomment previou
+s line for production
#my $filearg = $ARGV[0];#only used for testing. in production file com
+es on STDIN
#alias $infile to STDIN - this line is used for production
#my $infile = \*STDIN;

open (OUT, ">".$outfile) or die "Can't create temp file!!!!!!!!! $!";
#open (my $infile, "<".$filearg);

my $line;
while ($line = <>){
    #handle chapter endings - currently denoted by null OutputType
    $line = chapterize($line);
    #print "End Chapter Setting: $end_chapter\n";
    if ($end_chapter) {
        printline($line);
        handleSeps();
    }
    else {
        printline($line);
    }
}

sub del_KDKHost {
    #if this the KDKHost line, delete it
    my $line = shift;
    if ($line =~ m/^%KDKHost:/){
        $line = "";
        print "Killed KDKHost Line\n";
    }
    return $line;
}

sub make_title {
    #take XRXtitle line and create standard PS Title comment
    my $line = shift;
    if ($line =~ m!^%XRXtitle:!){
        my @parts = split (/: /, $line);
        my $title = pop (@parts);
        $title =~ s!(\r\n)!!;
        $line = "%%Title: ($title)\n";
    }
    $title_fixed = 1;
    return $line;
}

sub chapterize {
    #if OutputType is null, this is the end of a book.
    #replace the OutputType line with the perfectbinder command
    my $line = shift;
    if ($line =~ m!^<</OutputType \(\)>>setpagedevice!) {
        $line = $chapter;
        $end_chapter = 1;
    }
    return $line;
}

sub handleSeps {
    #if we just made a chapter, or this is the beginning of the file
    #we have to make the next 2 pages come out of the top exit
    my $counter= 0;
    my $file = shift;
    my $line;
    #print "Entered HandleSeps routine \n"; #uncomment for debug
    while ($line = <>) {
        #if its the KDKHost line, delete it
        #we have to do it here because
        #it is always in the header
        #and the header is processed as part of the
        #chapterization process.
        $line = del_KDKHost($line);
        $line = make_title($line);
        #delete media calls. These will always be at top of file as we
+ll
        unless ($media_killed){
            $line = kill_media($line, "pinky", "UNIVERSALID");
        }
        #if we have started a new page
        #increment page counter and if we are on the 3rd page
        #since chapter break, insert line for output to perfect binder
        if ($counter <= 3) {
            if ($line =~ m!%%BeginPageSetup!){
                $counter++;
                if ($counter==3){
                    #print "Begin Chapter - $linecount \n";
                    $line .= "$to_pp";
                    $counter = 0;
                    $end_chapter = 0;
                    printline($line);
                    return;
                }
            }
                
            #if it is the OutputType line for this page, change to top
+ output
            elsif ($line =~ m!<</OutputType\(Stacker\)>>setpagedevice!
+){
                #print "Stacker Line \n";
                $line =~ s/Stacker/top/;
            }
        }
        printline($line);
    }
}

sub kill_media {
    #sub takes line and a list of paper types to remove
    #and kills unwanted media calls.
    #We do this because jobs come with DocumentMediaReuqired
    #set to all possible media types, even if they aren't
    #actually used in this document.
    my $line = shift;
    my $media;
    if ($line =~ m!^%%DocumentMedia:!){
        $media_killed=1;
        printline($line);
        while ($line = <>){
            foreach $media (@_) {
                if ($line =~ m!$media!){
                    $line = "";
                    print "Killed media type: $media\n";
                    last;
                }
                elsif ($line !~ m!^%%\+!){
                    print "Returning Line: $line\n";
                    return $line;
                }
            }
            printline($line);
        }
    }
    else { 
        return $line;
    }
}

sub printline {
    my $out_line = shift;
    print OUT $out_line;
    #print "$out_line\n"
}
[download]

Thanks very much for any insight,
digger

Comment on Using null filehandle Select or Download Code

Replies are listed 'Best First'.
Re: Using null filehandle by sgifford (Prior) on Jun 08, 2004 at 18:14 UTC
What lines are missing? Lines from the beginning of the file, from the end, or just random lines throughout? Can you come up with a simpler testcase that demonstrates the problem? That will probably help you identify what's going on, and make it easier for people to help you. Since apparently your data is showing up on the console but not in your file, it may be a buffering problem. Output to files is buffered, and if your program crashes or is killed instead of exiting normally, the last bit of the buffer may not get flushed properly. Try putting an explicit `close` statement at the end of your loop, check it's error return, and print that you've closed the file succesfully; that may help identify the problem.	[reply] [d/l]
Re^2: Using null filehandle by digger (Friar) on Jun 09, 2004 at 10:30 UTC
Thanks for your reply, Sorry I wasn't clear about where the lines were missing. They are missing at the end of the file, not randomly throughout. I have narrowed the problem down to the handleSeps sub, which is the lion's share of the program. I will have more time today to dig into it a little more and pinpoint the exact problem. I have turned buffering off with no change in behavior, and it looks like my program never exits the loop, because the file never closes, and my program hangs at the console. This is even more perplexing, because if I just use an explicit filehandle (ie $myfile = $ARGV[0]; open IN ">$myfile";) everything works as expected. Thanks again, digger	[reply]
Re^3: Using null filehandle by sgifford (Prior) on Jun 10, 2004 at 18:39 UTC
Sometimes "hanging at the console" is really "trying to read from standard input". Since the primary difference between the code you have above (assuming you meant `open IN "<$myfile"`) is that `<>` will read from multiple files, it's possible that `@ARGV` contains more than you expect. In particular a single `-` character, it will try to read from standard input. You can print out `@ARGV` to see what files are being opened, or you can test `eof` on `ARGV` to tell when one file has closed and the next has opened (see the documentation for `eof` for an example). And still, it seems that turning off buffering should have fixed the problem. What code did you use to do that? `$\|=1` wouldn't work, because that only affects buffering for the currently selected filehandle (generally `STDOUT`).	[reply] [d/l] [select]
Re: Using null filehandle by iburrell (Chaplain) on Jun 08, 2004 at 19:24 UTC
The one thing I notice about your code is that you have three loops that read from <>. It is possible that some difference in the input results in the inner functions reading extra data. t is also possible that you missed a spot when changing the code so it was reading from a different file handle. You might want to add debugging code to print the line number and file for each line to see that the right file and lines are being read. `print STDERR "$ARGV:$.:$line";` [download] I would suggest rewriting the code so that there is only one loop that reads lines. Use flags or a state variable to control the processing. This makes it easier to analyze where the file is being read and what the state the processing is in at each stage.	[reply] [d/l]
Re^2: Using null filehandle by digger (Friar) on Jun 09, 2004 at 10:34 UTC
I agree that having reads in multiple loops does make solving this problem a little more complex. Since it is such a short script, I may try rewriting it using a single loop, although that could get ugly as well. I am still stumped as to why the behavior changes when I use an explicit filehandle versus taking advantage of the implicit filehandle syntax. Thanks for your input, digger	[reply]
Re: Using null filehandle by Eimi Metamorphoumai (Deacon) on Jun 08, 2004 at 21:25 UTC
One thing I notice is that you're using "`while ($line = <>){`" which will stop as soon as a line comes through that's "false" (ie, empty, or a literal "0"). You might try using "`while (defined($line = <>)){`" instead.	[reply] [d/l] [select]
Re^2: Using null filehandle by Somni (Friar) on Jun 09, 2004 at 07:36 UTC
Ever since perl 5.004 the code: `while ($line = <FH>)` has an implicit defined check, so that it's actually: `while (defined($line = <FH>))` This was because, prior to this, that code would emit a warning; it was a common enough desire that perl was simply modified to accept it as intended. Also, while it is possible for a line to have "0" on it, with no newline, it isn't possible for <FH> to return "" unless it's tied.	[reply] [d/l] [select]