Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I've seen a few similar problems to this posted here recently but none of them cover what I need to do and my Perl isn't strong enough yet for me to figure out how to do this.
I'm parsing a number of text files (can range in size from 1K upto 100M) where the record size of each line is variable. I need to find (or seek to ?) the last occurence of a pattern then apply some further parsing logic from that point on until the end of file.
What I do now (which doesn't cope with the problem) is
$foundit = 0; while (<$FH>) { $foundit = 1 if /^pattern/; } if ($foundit) { do some processing }
This was working ok until I started to get multiple occurences of the pattern ,the files I'm parsing are now appended to rather than overwritten :(
I had an idea of making two passes on each file to be parsed. The first pass to ascertain how many occurences of pattern were in the file (set a count). The second to start parsing when the count was hit. e.g.
open FH,"file" or die "Can't open file : $!\n"; $count1 = 0; while (<$FH>) { $count++ if /^pattern/; } close FH; open FH,"file" or die "Can't open file : $!\n"; $count2 = 0; while (<$FH>) { if (/^pattern/ and $count2 < $count1) { $count2++; } else { do some processing } }
This is a possibility but it's very slow, which is why I wonder whether anybody has some other ideas about how to do this more efficiently ?

Replies are listed 'Best First'.
Re: parse from last occurence of pattern match
by Cody Pendant (Prior) on Jun 07, 2006 at 10:41 UTC
    Logically the right thing to do is to read the file backward and when you find the "first" occurrence of the pattern, that's the last one.

    How to actually do it? Er, Tie::File would be a good place to start I suppose?



    ($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
    =~y~b-v~a-z~s; print
Re: parse from last occurence of pattern match
by GrandFather (Saint) on Jun 07, 2006 at 11:58 UTC

    I'd be inclined to store the file offset of either the line or the line following the last match (using tell) then you can seek to the position in the file that you want to continue from having reached the end of the file.

    The following code demonstrates the technique:

    use strict; use warnings; my $lastFound = -1; # No match value $| = 1; /\ba\b/ and ($lastFound = tell DATA) while <DATA>; print "No match \n" and exit if $lastFound == -1; seek DATA, $lastFound, 0; print <DATA>; __DATA__ Hi, I've seen a few similar problems to this posted here recently but +none of them cover what I need to do and my Perl isn't strong enough yet for m +e to figure out how to do this. I'm parsing a number of text files (can ran +ge in size from 1K upto 100M) where the record size of each line is variable. I n +eed to find (or seek to ?) the last occurence of a pattern then apply some fu +rther parsing logic from that point on until the end of file. What I do now +(which doesn't cope with the problem) is:

    Prints:

    parsing logic from that point on until the end of file. What I do now +(which doesn't cope with the problem) is:

    DWIM is Perl's answer to Gödel
Re: parse from last occurence of pattern match
by jhourcle (Prior) on Jun 07, 2006 at 11:59 UTC
Re: parse from last occurence of pattern match
by johngg (Canon) on Jun 07, 2006 at 13:27 UTC
    You mention that the files you are parsing are now appended to rather than being overwritten. It should be possible for you to save the position of your last match for each file every time you do a run. That way you could immediately skip past what had already been parsed in previous runs when starting a new run.

    You would probably have to do a check on each file to make sure it hadn't been overwritten, perhaps save the size of each file as well and check that the file is not smaller now than it was during the last run.

    Cheers,

    JohnGG

Re: parse from last occurence of pattern match
by girarde (Hermit) on Jun 07, 2006 at 14:18 UTC
    Assuming your resources permit you to slurp the file, you could change the pattern to /.{1,}pattern/. The {1,} forces as many matches to '.' as possible, ensuring that the pattern is matched at its last occurrence. Not knowing the type processing, you may need to open the file in a separate handle with the record separator left alone.