parsing large files for something

cseligman has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: parsing large files for something by wind (Priest) on Mar 31, 2011 at 22:35 UTC
As AnonMonk so delicately put it, Just do it. The size of the file will be inconsequential if you're just doing the processing line by line. `my $buffer; while (<$fh>) { $buffer = $_ if /$regex1/; print $buffer if /$regex2/; }` [download]	[reply] [d/l]
Re: parsing large files for something by Anonymous Monk on Mar 31, 2011 at 22:31 UTC
What is the best (most elegant) way to approach this problem? Just do it	[reply]
Re: parsing large files for something by GrandFather (Saint) on Apr 01, 2011 at 03:53 UTC
The following may be what you are looking for, although this allows the match and condition key words to be on the same line. If that's not a requirement the code could be simplified somewhat. #!/usr/bin/perl use strict; use warnings; my %matches = (I => {cond => 'megabyte'}, code => {cond => 'I'},); my $match = "\\b" . join("\\b\|\\b", keys %matches) . "\\b"; my %conds; my $condMatch; while (defined(my $line = <DATA>)) { my @segments = split /(?=$match)/, $line; for my $segment (@segments) { while ($segment =~ /($match)/g) { my $cond = $matches{$1}{cond}; if (exists $conds{$cond}) { delete $conds{$cond}; } else { $conds{$cond} = $line; } $condMatch = join "\\b\|\\b", keys %conds; $condMatch = "\\b$condMatch\\b" if $condMatch; } if ($condMatch && $segment =~ /($condMatch)/) { print $conds{$1}; delete $conds{$1}; } } } __DATA__ Suppose I want to find something, that is present multiple times in a +large (several hundred megabyte) file. However, I only want it if something +else exists after it but before the next occurrence of the thing I am looki +ng for. For instance the thing I am looking for might be a set of code numbers + and the condition that I will use to decide whether I want the code may be 1 o +r more lines further into the file but before the next occurrence of a code n +umber. What is the best (most elegant) way to approach this problem? Chet [download] Prints: `Suppose I want to find something, that is present multiple times in a +large For instance the thing I am looking for might be a set of code numbers + and the` [download] True laziness is hard work	[reply] [d/l] [select]
Re: parsing large files for something by locked_user sundialsvc4 (Abbot) on Apr 01, 2011 at 03:02 UTC
Don’t be flummoxed by the “large” size of the file. (To Perl, it isn’t large at all.) You have two kinds of lines you are looking for: “code” lines, and “decision” lines. What you care about in the first case is to capture `$last_code_seen`. In the second case, `if (defined($last_code_seen)) { ...`, you want to make your decision. Perl munches on “several hundred megabyte” files as a midnight snack.