just dave has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks!
I have a program that among other things, reads from a file (a large file) in this manner:
#opening the file for reading open IN,"$fileName" ; #reading the whole file into an array. my @F=<IN>; close IN; #reading the last line of the file my $lastLine=$F[$#F]; #reading the first line of the file my $firstLine=$F[0];
Now, I have two problems:
1. I have a memory leak and I think this might be the source since the file is large and is read a few times every minute
2. I now need to find a specific line in the file not only the first and last like the example shows(I know the suffix) is there a better way that to run on the (loop) array and find it ? Thanks a lot! you guys always save me!

Replies are listed 'Best First'.
Re: Reading from a file
by Joost (Canon) on May 23, 2006 at 08:42 UTC
    1. Is it really necessary to re-read the whole file each time? Does the file actually change that much or is it just appended to? If the latter see also File::Tail.

    2. I'm guessing here, but you probably want more than one line out of the file, right? Try something like:

    open IN,"<",$fileName or die "Can't open $filename: $!"; while (<IN>) { # read line from IN handle to $_ if ($. == $some_number || $. == $some_other_number) { # $. is the c +urrent line number # do stuff with the line in $_ } last if $. > $last_interesting_line_number; } close IN;
Re: Reading from a file
by davorg (Chancellor) on May 23, 2006 at 08:35 UTC

    You might want to take a look at Tie::File.

    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: Reading from a file
by bart (Canon) on May 23, 2006 at 09:04 UTC
    I now need to find a specific line in the file not only the first and last like the example shows(I know the suffix) is there a better way that to run on the (loop) array and find it ?
    Last I heard (only a few days ago), davido was working on a module for CPAN, to use an index on a text file. You might want to use it when it becomes available.

    In the meantime, here's one way you can do it by hand. The assumption is that the original data text file is not changed often.

    First, here's how to create the index file:

    open IN, "<", $textfile or die "Can't open textfile: $!"; binmode IN; open IDX, ">", "$textfile.idx" or die "Can't create index file: $!"; binmode IDX; print IDX pack "N", 0; while(<IN>) { print IDX, pack "N", tell IN; }

    Next, here's how to look up a line by number (in $lineno):

    open TEXT, "<", $textfile or die "Can't open textfile: $!"; open IDX, "<", "$textfile.idx" or die "Can't open index file: $!"; binmode IDX; seek IDX, 4*$lineno, 0; read IDX, my($buf), 4; seek TEXT, unpack("N", $buf), 0; $line = <TEXT>;
    (update: fixed parameter order for seek)

    As you may have guessed, the pack/unpack serves to make the integer index a fixed length (4 bytes).

Re: Reading from a file
by TedPride (Priest) on May 23, 2006 at 08:57 UTC
    Does the file change between loads? If not, you can just run through the file once and generate a list of line positions (use pack to compress each position into 4 bytes, rather than storing the positions as longer, not fixed-length strings), then from then on jump to the index position corresponding to the line you want, grab 4 bytes, unpack to find the line position, then go to that position in the main file and read the record.

    Or you can use Tie::File, but where's the fun in that? :)

Re: Reading from a file
by LanceDeeply (Chaplain) on May 23, 2006 at 11:10 UTC
    are you running on linux?
    my $first_line = qx/head -1 $filename/; print $first_line; my $last_line = qx/tail -1 $filename/; print $last_line; my $n = 100; my $nth_line = qx/tail +$n $filename | head -1/; print $nth_line;

    -hth
      Hello and thank you all!
      A few answers:
      1. yes, I'm running on linux !!
      2. this is a reporter file, it gets appended all the time, every minute or so
      3. the last line is the most updated(since the report row /line is added every minute), that's why I'm interested in it, and I used to read the first line too since it holds the columns name(I get the data by the column name) BUT
      now I found a bug- if a change is made, and a column is added,then a new row is added (similar to the first row holding the names) but this row also holds the name of the new column, BUT it doesn't know about it since it reads the column name from the first row in the file
      So, I need to change this and search the file for this row and if it's not found-I'll end up with the first one

      It look something like this:
      #name: time input output users connected ... 12:23 23 34 780 560 12:24 21 40 780 570 ....
        ok-
        in keeping with the *nix commands theme how about:
        # # find last row that defines the columns ( i.e. starts with #name: ) # my $column_names = qx/grep ^\#name $filename | tail -1/; print $column_names;
        -hth
Re: Reading from a file
by roboticus (Chancellor) on May 23, 2006 at 12:10 UTC
    just dave:

    If you're going to look for multiple lines in the file, perhaps you should read the file as one huge string, then use regular expressions to find the lines of interest. For example:

    #!/usr/bin/perl -w use strict; use warnings; # Slurp entire file into $file my $file; open FH, $0 or die $!; { # localize scope as described in perlvar local $/; $file = <FH>; } my $line = '---not found---'; $line=$1 if $file=~/\n(ope[^\n]*)/; print "Line='", $line, "'";
    should print:

    Line='open FH, $0 or die $!;'
    --roboticus
      roboticus,
      Thanks, but as I've shown in my 1st posting, what I'm doing now, is to read the file into an array and then I can read from the array line-by-line, I don't see how your sugestion is different.
      The main problem is that I read this file a lot(I actually have various reporter files), and I need to find a specific line ( the "header" ) and the last line(the updated "data" )
      my question is: is there a simple way to do so, and is there a way to do so without "saving" the whole file into an array or string or such.
      Thanks
        Sounds to me like Joost had the right idea: Try File::Tail. That would allow you to read the whole file in at startup, scan it for the last header line, and then just pick up the new lines (data or header) as they're added instead of repeatedly reading and processing the entire file.
        just dave:

        I was thinking that by using a single string rather than an array, you can avoid the explicit looping, making your code clearer. Since you also mentioned having a memory leak, I thought that a single string would fragment memory much less than a bazillion smaller strings. That might make it easier for perl to reuse the memory and/or recognize what can be freed

        Other than that, it's not much different at all.

        Of course, since you're rereading the same file repeatedly, I think that Joost's suggestion is my favorite so far. Then, while looking up File::Tail, I saw File::Tail::App, which is pretty cool because it does much of the grunt work for you.

        --roboticus

Re: Reading from a file
by spiritway (Vicar) on May 23, 2006 at 16:19 UTC

    Hi, [id://just dave]. This isn't in reply to your question, but I would suggest that you get into the habit of use'ing strict and warnings. Also, it's a very good idea to check every file operation after you perform it, to ensure that you've actually accomplished what you thought you did. Doing so can save you endless hours of painful debugging. Ask me how I know ;-)

Re: Reading from a file
by just dave (Acolyte) on May 23, 2006 at 13:16 UTC
    I have another related question:
    The reading from this reporter file is done in a method, i.e I read the file into a locale array, use the array (read last line ect.) and exit.
    This method is called about 20 times a minute and I think this is one cause of my memory leak,
    the question is: if I make the array global(instead of local) will this "fix" the memory problem ?
    Thanks
Re: Reading from a file
by TedPride (Priest) on May 23, 2006 at 21:39 UTC
    You can read the first line using the regular method - open, read to first record delimiter. Then just split to get your field names, and convert them to a hash so you can find the proper field in later records. That's no problem, even if fields are added or removed.
    chomp($record = <DATA>); $keys{$_} = $c++ for split / /, $record; ## Get last record and put it in $record @record = split / /, $record; print $record[$keys{'timestamp'}];
    The harder part is getting at the last record. For this, jump to the end of the file minus whatever the maximum possible record size could be (+ record delimiter), then read everything from there to the end and retrieve the last record. Again, not that difficult.