MadPogo has asked for the wisdom of the Perl Monks concerning the following question:

I wish I was more savy as to pose this question in a way that is more clear but let me try to explain. I have a data file that consists of binary data with an ascii based header applied to the beginning of the binary data. When the data is ultimately used the (routing) header is removed and the binary data is written out to file. I am wanting to refine our data management process and improve a file processing daemon which currently is painfully slow. I was trying to find the best way to read in the data file WITHOUT reading in the whole file and without depending on system based commands like "head". Some of the files are in excess of 30 Megs and all I care about are at most the first 13-18 lines within the file. The end of the header is denoted by a END text string (go figure). Ideally I would like to read the file from the beginning up until this "END" string and then close the filehandle so that I can process the data file based on this ascii header content. Any advice on how I can tackle this or even what modules might benefit my endeavors would be appreciated. My searches at this point have been fruitless. Thanx in advance, MadPogo ~~ What does not kill me makes me stronger, right?
  • Comment on Reading a Filehandle by line or by stream

Replies are listed 'Best First'.
Re: Reading a Filehandle by line or by stream
by Zaxo (Archbishop) on Sep 25, 2002 at 02:39 UTC

    Make perl think your header is one line:

    my $header; { local $/ = 'END'; open FH, "< $filename" or die $!; $header = <FH>; close FH or die $!; }

    Localizing $/ in brackets cuts out bad side effects, and just reading to the marker gives you minimal read length.

    After Compline,
    Zaxo

Re: Reading a Filehandle by line or by stream
by BrowserUk (Patriarch) on Sep 25, 2002 at 02:43 UTC

    If you only want the lines upto a unique token, you can set $/ to be that token and the read all the lines in in one go. (See perlman:perlvar and search for $/.

    Use split to break these into the individual lines as below.

    Note the localization of the setting of $/.

    #! perl -sw use strict; my $header; { local $/ = 'END'; open FILE, "<$ARGV[0]" or die "$!\n"; $header = <FILE>; #print tell FILE; close FILE; } my @lines = split /\n/, $header; print $_.$/ for @lines;

    That works fine if you only want to read the pre-pended lines. If you want to actually remove them from the file so that you end up with just the binary file again, you could do this with perl but its doubtful if this would be as fast as using tail with a binary offset of the byte after the end token.

    The commented out print tell FILE; could be used to get the position to supply the offset for the tail command (eg. tail +nnn file >binonly). This would probably be far more efficient than reading it in and writing it out with Perl as the system utilities have code built in for determining the optimum buffer sizes etc.

    There is of course no reason why you shouldn't issue the tail command from within your Perl script:)


    Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!
Re: Reading a Filehandle by line or by stream
by Enlil (Parson) on Sep 25, 2002 at 01:35 UTC
    Would this work:
    open (FH,"filename here") or die; while ( my $line = <FH> ) { DO STUFF HERE WITH LINES; last if $line =~ /^END$/; } close(FH);
    Of course I am assuming that the END text string is the first thing and only thing on the line.
Re: Reading a Filehandle by line or by stream
by diotalevi (Canon) on Sep 25, 2002 at 01:30 UTC

    It's entirely reasonable to just read the first few kilobytes from your input file and process from that. You could also continue to read additional data from the file until you find your 'END' token. There's no magic here - just check out read() and sysread().