You might try something like this:
foreach (@lines) {
next unless /start-phrase-here/ .. /stop-phrase-here/;
# process
}
If you have no 'stop' phrase, you might use 'undef' on that side of the .. operator, or go with a more logical-looking flag approach:
my $flag;
foreach (@lines) {
$flag++ if /start-phrase/;
next unless $flag;
# process
}
U: I can't believe I didn't mention this originally, but you should probably be aware that parsing any HTML on your own is going to be extremely hard/unreliable, unless you have precise control over the formatting of the page. Better to use an HTML::Parser-derived module to pull the HTML data into Perl, and then work with the Perl data structure to get what you need. | [reply] [d/l] [select] |
Unreliably {grin}. I realize you are probably extracting from a consistent source, but webpages don't necessarily have \n delimiters in logical places, and some sites may have \r in them as well. I'd suggest using something like HTML::TokeParser. It will break the page up into tokens which consist of start tags, end tags, and text tags (and a few other things you probably don't care about). You can easily grab and discard tokens until you get to the one that matches your criterion, then start processing from there. | [reply] |
Yet another approach:
Don't break the content into lines.
while ($html =~ m/.../gis) {
push @pos, pos $html;
}
#see the pos doc. | [reply] |
i dunno if this quite applies, but i work with logs that i usually pump into arrays. these logs have a lot of garbage, so i wrote a quick subroutine to basically shift off lines i don't care about until i find what i'm looking for.
so assuming you have no problem extracting the text you wanna look at, you can use something like this:
sub shift_until($\@)
{
#usage: shift_until($somepattern, @list);
#this takes a @list and throws away lines until it hits $pattern
#$pattern can be a regexp.
#if $pattern isn't found, the list is emptied completely.
#it's good for parsing through garbage
#(hey, i never said it would be useful to everybody)
my $pattern = shift;
my $array_ref = $_[0];
while (@$array_ref and not $$array_ref[0] =~ /$pattern/)
{
shift @$array_ref;
}
(@$array_ref and return 1) or return 0;
}
this could easily be modified to non-destructively seek to a position in the array, or to use a file instead of an array, ad nauseum. | [reply] [d/l] |