in reply to Re: file size limit with Tie::File
in thread file size limit with Tie::File

Thanks for the quick reply. I had tried different combinations of memory and cache options. Actually it eventually does return after about 5 minutes so it's not hung, just slow. I thought I would avoid reading the entire file into memory by using this module(wrong). I am trying to display a 1000 lines or so at a time in a Tk ROText widget and I thought this tie could work. I'll go back to while ( <FH> ) and tells and seeks and see if that gives me better performance.

Replies are listed 'Best First'.
Re^3: file size limit with Tie::File
by davido (Cardinal) on Apr 08, 2007 at 22:16 UTC

    Unfortunately expecting to get the number of lines in a text file without reading the file is an impossibility no matter what module you use. Lines in a plain text file are of variable length. So there is no simple calculation that if a file is xxx kilobytes it must be yyy lines long. That means the only way for any program or module to determine how many lines you have, is to count how many "newline" characters are found in the file. And that's the same as counting any other character; you've got to read through the file to find out.

    Quick, how many lines are there in the camel book? Until you've counted them, you'll never know. There's no magic here. If you need a quick solution, do a line count once and save it, and modify it as the file gets modified.

    Tie::File is a convenience module, and it provides this convenience with what seems usually to be a very minor performance penalty. You have stumbled into a situation where the module doesn't appear to excel, but regardless of the solution you come up with, you're going to have to read the file at least once.


    Dave

Re^3: file size limit with Tie::File
by graff (Chancellor) on Apr 09, 2007 at 00:13 UTC
    I am trying to display a 1000 lines or so at a time in a Tk ROText widget and I thought this tie could work. I'll go back to while ( <FH> ) and tells and seeks and see if that gives me better performance.

    You might find some relevant ideas in this older thread: Displaying/buffering huge text files.

    If you decide to take the time to index the byte offsets to all the line-endings in your log file, that will surely end up providing much better performance, but if the log file changes over time, you'll be updating the index constantly. Of course, that'll be a simple process of appending more byte offsets as more lines are added, but it's likely that the index will become unwieldy (maybe the line count is such that indexing all the lines is already unwieldy).

    If the goal is simply to be able to show a good-sized chunk of lines in a Tk ROText window, maybe you don't really need accurate info about where the line endings are. Just use reasonable estimates where necessary, along the following lines:

    $requested_start = ...; # a value between 0 and 1 $avg_line_len = ...; # make a guess or read a small sample to est +imate this $file_size = -s $filename; $read_length = $avg_line_len * 1000; seek( FH, $file_size * $requested_start, 0 ); read( FH, $text, $read_length ); $text =~ s/^.*\n//; # trim initial and final $text =~ s/.*$//; # line fragments from $text
Re^3: file size limit with Tie::File
by Limbic~Region (Chancellor) on Apr 09, 2007 at 19:14 UTC
    alw,
    "I thought I would avoid reading the entire file into memory by using this module(wrong)."

    You have misquoted BrowserUk who said Tie::File must read the whole file. Tie::File works by indexing the locations of $/ (default = "\n"). To do this, it must read the whole file but as BrowserUk points out - it only reads so much into memory at a time. Other advice in this thread applies but I wanted to point out that you were inaccurate.

    Update: Typo corrected thanks to BrowserUk++

    Cheers - L~R