in reply to How do I backtrack while reading a file line-by-line?

You can also Tie the file to an array. This also has the advantage of being able to control memory usage and has a read cache.

Then all you need is a c-style for loop a $save scalar and a %done hash.

Similar to:

my @array = qw/ foo bar baz blah bar blah baz/; my $save = 0; my %done; for (my $x = 0; $x <= $#array; $x++) { $save = $x if ($array[$x] eq 'bar' ); print "X:$x SAVE:$save $array[$x]\n"; if ( $array[$x] eq 'blah' and !defined($done{$x}) ) { $done{$x}++; $x = $save; } }


grep
One dead unjugged rabbit fish later

Replies are listed 'Best First'.
Re^2: How do I backtrack while reading a file line-by-line?
by ikegami (Patriarch) on Oct 13, 2006 at 20:42 UTC

    That section on memory usage is very misleading. Tie::File keeps the index of every encountered lines (i.e. every lines up to the highest one read/written) in memory. In other words, if you do $tied[-1] or push @tied, ..., the index of every line in the file is loaded into memory (if they haven't already been loaded).

    Tie::File is still a very useful module.

      from the POD:
      memory - This is an upper limit on the amount of memory that Tie::File will consume at any time while managing the file. This is used for two things: managing the read cache and managing the deferred write buffer

      I didn't find that misleading. It says to me that only chunks of the file data are loaded into memory. In fact, I assumed that it loaded a full index of the lines at instantiation.

      If the OP knows about how much data an average (or the largest) backtrack is, the read cache could optimized for memory usage/speed. Plus you get a layer of abstraction to hide any nastiness.



      grep
      One dead unjugged rabbit fish later

        I didn't find that misleading.

        "[The memory parameter] is an upper limit on the amount of memory that Tie::File will consume at any time while managing the file" is a false statement. Tie::File's memory usage is unbouded. The docs do specify an exception, but it's very misleading.

        The memory value is not an absolute or exact limit on the memory used. Tie::File objects contains some structures besides the read cache and the deferred write buffer, whose sizes are not charged against memory.

        Does that give the impression that Tie::File's memory usage is unbounded? If not, then the docs are misleading.