timurFriedman has asked for the wisdom of the Perl Monks concerning the following question:

Am I allowed to seek STDIO when input is coming from a pipe?

My Perl script is called as follows from the UNIX shell:
   cat files.* | script.pl

What script.pl does is look at the first line of each file and call the appropriate subroutine to process that file. The problem is that the subroutine should get the file whole, not with the first line missing (and there are reasons why the subroutines should not be rewritten to accomodate a file with the first line missing). So I would like script.pl to 'rewind' back to the beginning of the file before calling the subroutine.

Better yet, if there were some way that script.pl could 'peek' at the first line without removing it from the input stream.

Replies are listed 'Best First'.
Re: seeking piped STDIO
by Dominus (Parson) on Nov 16, 2000 at 22:55 UTC
    Following up on my earlier suggestion, if you were to invoke the program as script.pl files.* instead of as cat files.* | script.pl, then you would still be able to use the convenient <> operator for reading the input. You might try something like this:

    while (<>) { # examine the first line of the file # do the appropriate thing with it # (for example:) if (/^Expense Report/) { $type = 'Expense' } elsif (/^Invoice/) { $type = 'Invoice' } else { die ... } # Now seek the file seek(ARGV, 0, 0) or die "seek: $!"; process_data($type); # call the appropriate processor # the rest of the data in the file has been eaten } sub process_data { my ($type) = shift; # call process_invoice or process_expense_report # depending on $type } sub process_invoice { while (<>) { # do something last if eof; } }
    The key part is the last if eof line. That tells process_invoice to return back to the main loop at the end of the current file. Without it, process_invoice would continue on to the next file automatically, because it is using <>.

    The seek back in the main routine will ensure that the processor functions such as process_invoice see the entire file, including the first line.

Re: seeking piped STDIO
by AgentM (Curate) on Nov 16, 2000 at 22:51 UTC
    To answer your question- No, you may not seek() on that pipe. Another solution would be to store the first line and pass it along with the file handle in the function call. You might also modify your script with "advanced features" such as taking file arguments which you could then easily read with <> or @ARGV. It might be worthwhile, since it won't take long and will enable you to seek() whenever you wish in the file for additional functionality.
    AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.
Re: seeking piped STDIO
by reyjrar (Hermit) on Nov 16, 2000 at 23:03 UTC
    I was just reading something about something similar to this is the Perl Cookbook. Where you could seek() back to the begining on the perl sciprt and actually parse the script from the script. weird yet none the less, cool.
    however I don't think this is what you want to be doing in this case.. I think what you want to do is something similar to what adamsj suggested by rewriting your script to either take command line args as filenames, or "piping" the file names to the script ie:
    ls files.* | script.pl OR ./script.pl files.ext files.ext2 files.ext3
    or something else to that extent..
    be creative, but account for as many cases as you can think of.. if the files are going to be located in the same place and/or named the same thing.. maybe you only want to pass the directory where the files are and/or the file prefix/suffix to open.. that way you can use an opendir(), readdir() and match files with regular expressions so you can have stuff like:
    ./script.pl -d /my/dir/for/files -f files
    and in the script:
    .... $defaultdir = "/usual/place/to/look"; $dir = $opt_d; $dir = $defdir unless -e $dir; $defprefix = "files\."; $prefix = $opt_f; $prefix = $defprefix unless $prefix; opendir(LS, $dir) || die "couldn't open dir\n"; while(my $file = readdir(DIR)) { if($file =~ /^$prefix/) { push @FILES,$file; } } ....
    then do what adamsj suggested, open each file in @FILES, check the first line, then send the file name or filehandle to the subroutine and have the subroutine open the file or seek to the beginning of the file handle.

    just a few ideas..

    -brad..
      I used to live dangerously--writing scripts that used themselves as data, then destroyed themselves after doing their dirty work. Sounds like illicit activity? Not on your life--just a way to cleanly install something on a massive remote basis when silly rules constrained me to one file, which I could only execute--I couldn't, for instance, send a tarball, pop it open, and then do the install. (Those rules cared about number of files and not actual bandwidth.)
Re: seeking piped STDIO
by timurFriedman (Initiate) on Nov 16, 2000 at 22:54 UTC
    Unfortunately, there is not much flexibility in how the script will be receiving its input. It must accept a stream that is piped to its standard input, without knowing the identity of the files that make up that stream. So the file handle that it has is the file handle for STDIO, which it would not want to close before passing.

    The only way that the script can tell one file from the next in the incoming stream is by matching the identifying information that comes in the first line of each file. Then it needs to pass the STDIO filehandle to a subroutine to parse the file.
      In that case I can think of two things to do.

      1. Use a tied filehandle. The tied object will contain the line you already read, and the real filehandle. Pass the tied filehandle to the subroutine. When the subroutine tries to read the tied filehandle the first time, give it the line you already read. After that, give it data out of the real filehandle.

      2. Read all the data from standard input and save it to a temporary file, or several temporary files. Then use seek on the temporary file.

      Solutions 'outside the box' would include (3) redesigning the stupid subroutines that require a filehandle argument instead of a string argument, and (4) finding the person responsible for dumping all the file data insto a single stream in the first place, and hitting him with an ax.

        ... and (4) finding the person responsible for dumping all the file data into a single stream in the first place, and hitting him with an ax...
        I am in awe.. that was beautiful!
        instant ++ here! :)

        -brad..
        This solves my problem! Tying, though a bit laborious, looks to be a very clean solution. I believe it will solve another problem as well, which is that the subroutines expect an eof. The tied filehandle could supply that when it detects the start of the next file.

        Suggestion (4) is also intruiging. :)