Jaspersan has asked for the wisdom of the Perl Monks concerning the following question:

I need to open a file and extract everything up to the third occurance of a certain string (like the html '/p' tag).

I've tried a bunch of differant ways, but they all have failed. I know this is simple to do but at the moment im just not in the frame of mind to come up with it :P.
Any help would be appreciated.

^jasper <jasper@wintermarket.org>

Replies are listed 'Best First'.
Re: Extracting up to a certain string
by Zaxo (Archbishop) on Aug 04, 2002 at 07:16 UTC
    my @foo; { local $/ = 'certain string'; open my $file, '<', '/path/to/afile' or die $!; @foo = (<$file>)[0..2]; }

    Now @foo contains everything up to and including the third occurance.

    After Compline,
    Zaxo

      Setting $/ to the string you want won't work if the "P" in </p> might be either upper or lowercase in your input. One way to handle that case is:
      open(FILE, "yourfile") or die; my $file = do {local $/; <FILE>}; $file =~ m!(.*?</p>.*?</p>.*?</p>)!is; my $match = $1;
      -sauoq
      "My two cents aren't worth a dime.";
      
Re: Extracting up to a certain string
by insensate (Hermit) on Aug 04, 2002 at 05:45 UTC
    How about something like this?
    while(<DATA>){ $before=$count; while($_=~/<\/p>/g){$count++} if($count>=3){ @segments=split/<\/p>/; for($before..2){ $toAdd.="$segments[$i]</p>"; $i++; } push@keep,$toAdd; last; }else{ push@keep,$_; } } print @keep; __DATA__ some</p>stuff</p> </p>other</p>things
    This will let you get around the nasty circumstance of multiple matches per line.
    Jason
Re: Extracting up to a certain string
by fuzzyping (Chaplain) on Aug 04, 2002 at 04:10 UTC
    Would you mind posting some sample code and/or data? It would be a lot easier to help if we had something to work with. In particular, this sounds like a problem waiting for a regex solution, but I can't design a regex without seeing the data. :)

    -fp
Re: Extracting up to a certain string
by Jaspersan (Beadle) on Aug 04, 2002 at 17:37 UTC
    Here is the data. Its for the news portion of a site, I need to display the top three dates (between 'p' '/p'). I tried the codes from above and they work great! Thanks:)

    <code>

    -7.31.02 2:17AM
    Some News

    -7.24.02 12:51PM
    Some News

    -7.02.02 10:07AM
    Some News

    -6.12.02 10:26PM
    Some News