in reply to searching across a file then parsing

The 3 lines that you included didn't clarify much for me. If you put <code> ... your text here ... </code> tags around file or program text it will come out verbatim and this makes it a lot easier for us to see what you have.

You say "The file is quite large". If that means its so big that it won't fit into memory, then I would consider using command line grep with the -A and -B options to extract out hunks of stuff that will fit into memory. That should make this "I need stuff before the match" easier.

from the grep man page.... Context control: -B, --before-context=NUM print NUM lines of leading context -A, --after-context=NUM print NUM lines of trailing context
Can you show some code of what you have so far? Its really unclear what for example "I need to back up to a known point" means.

Replies are listed 'Best First'.
Re^2: searching across a file then parsing
by Anonymous Monk on Feb 17, 2012 at 04:41 UTC

    Its really unclear what for example "I need to back up to a known point" means.

    probably seek

      This is purely a guess, but I suspect that rather than a character-by-character numerical position change (like that provided by 'seek'), the OP is looking for a "line-by-line cache" - similar to something I did recently.

      I'm going to assume that the file is too big to stuff into a scalar and process with something like /($start.*?$match.*?$end)/ - i.e., that it does indeed need to be read one line at a time.

      #!/usr/bin/perl use common::sense; my $pat1 = "TOP_PATTERN"; my $pat2 = "SEARCH_PATTERN"; my ($flag, $cache); while (<DATA>){ if ($flag){ last if /$pat1/; print and next; } if (/$pat1/){ $cache = "" and next if $cache; $cache .= $_ and next; } if ($cache){ $cache .= $_; $flag = 1 and print $cache if /$pat2/; } } __END__ TOP_PATTERN 1 2 3 TOP_PATTERN 4 5 6 SEARCH_PATTERN 7 TOP_PATTERN 8 9

      Output:

      TOP_PATTERN 4 5 6 SEARCH_PATTERN 7
      -- 
      I hate storms, but calms undermine my spirits.
       -- Bernard Moitessier, "The Long Way"
        oko1, Thanks for taking the time to comment. This is actually exactly what I'm looking for. I apologize for possibly mis-leading the readers with my "large file"...that was relative and the file isn't really all that large at all...only a few Kbs...but, does vary. I've updated the post to give a better idea what I'm attempting. I think this really helps. What I'm doing is based on an operator input (something like "OPS" and "REV") parse the file unil I've found the specific information I'm looking for. then, I want to grab everything relevant. The issue is that the line I've been looking at "1. OPS 7554 REV 225888" is below that actual line that I need to pull in. But, I think this will get me much farther than I am now. Much appreciated.
      Seek is a low level critter that only deals with bytes, not lines. And if we are talking about unicode, we aren't even talking about characters since one character could be multiple bytes. I think it is highly unlikely that seek() will play any role at all in the final solution to this problem. It is very rare to use seek in a text file.

      The problem requirements just aren't specified clearly enough here to proceed further without hearing more from the OP.

      I like the idea presented by oko1. But until the OP gives a rule for what to include before the "match", we are just all guessing.

      Anonymous, I agree...sorry about that. I've updated it to better explain what I'm attempting to do. Thanks for the help.