cseligman has asked for the wisdom of the Perl Monks concerning the following question:

Suppose I want to find something, that is present multiple times in a large (several hundred megabyte) file. However, I only want it if something else exists after it but before the next occurrence of the thing I am looking for. For instance the thing I am looking for might be a set of code numbers and the condition that I will use to decide whether I want the code may be 1 or more lines further into the file but before the next occurrence of a code number. What is the best (most elegant) way to approach this problem? Chet

Replies are listed 'Best First'.
Re: parsing large files for something
by wind (Priest) on Mar 31, 2011 at 22:35 UTC
    As AnonMonk so delicately put it, Just do it. The size of the file will be inconsequential if you're just doing the processing line by line.
    my $buffer; while (<$fh>) { $buffer = $_ if /$regex1/; print $buffer if /$regex2/; }
Re: parsing large files for something
by Anonymous Monk on Mar 31, 2011 at 22:31 UTC
    What is the best (most elegant) way to approach this problem?

    Just do it

Re: parsing large files for something
by GrandFather (Saint) on Apr 01, 2011 at 03:53 UTC

    The following may be what you are looking for, although this allows the match and condition key words to be on the same line. If that's not a requirement the code could be simplified somewhat.

    #!/usr/bin/perl use strict; use warnings; my %matches = (I => {cond => 'megabyte'}, code => {cond => 'I'},); my $match = "\\b" . join("\\b|\\b", keys %matches) . "\\b"; my %conds; my $condMatch; while (defined(my $line = <DATA>)) { my @segments = split /(?=$match)/, $line; for my $segment (@segments) { while ($segment =~ /($match)/g) { my $cond = $matches{$1}{cond}; if (exists $conds{$cond}) { delete $conds{$cond}; } else { $conds{$cond} = $line; } $condMatch = join "\\b|\\b", keys %conds; $condMatch = "\\b$condMatch\\b" if $condMatch; } if ($condMatch && $segment =~ /($condMatch)/) { print $conds{$1}; delete $conds{$1}; } } } __DATA__ Suppose I want to find something, that is present multiple times in a +large (several hundred megabyte) file. However, I only want it if something +else exists after it but before the next occurrence of the thing I am looki +ng for. For instance the thing I am looking for might be a set of code numbers + and the condition that I will use to decide whether I want the code may be 1 o +r more lines further into the file but before the next occurrence of a code n +umber. What is the best (most elegant) way to approach this problem? Chet

    Prints:

    Suppose I want to find something, that is present multiple times in a +large For instance the thing I am looking for might be a set of code numbers + and the
    True laziness is hard work
Re: parsing large files for something
by locked_user sundialsvc4 (Abbot) on Apr 01, 2011 at 03:02 UTC

    Don’t be flummoxed by the “large” size of the file.   (To Perl, it isn’t large at all.)

    You have two kinds of lines you are looking for:   “code” lines, and “decision” lines.   What you care about in the first case is to capture $last_code_seen.   In the second case, if (defined($last_code_seen)) { ..., you want to make your decision.

    Perl munches on “several hundred megabyte” files as a midnight snack.