donfreenut has asked for the wisdom of the Perl Monks concerning the following question:


I'm getting a webpage with LWP::UserAgent. I've split(/\n/, $response->content()) to get a list of the lines in the webpage.

I want to go through the data, ignoring it until I hit a line with a specific phrase. After seeing that phrase, I want to begin processing the data.

How can I do this?

Replies are listed 'Best First'.
Re: Sucking down a file
by Fastolfe (Vicar) on Jan 31, 2001 at 03:04 UTC
    You might try something like this:
    foreach (@lines) { next unless /start-phrase-here/ .. /stop-phrase-here/; # process }
    If you have no 'stop' phrase, you might use 'undef' on that side of the .. operator, or go with a more logical-looking flag approach:
    my $flag; foreach (@lines) { $flag++ if /start-phrase/; next unless $flag; # process }
    U: I can't believe I didn't mention this originally, but you should probably be aware that parsing any HTML on your own is going to be extremely hard/unreliable, unless you have precise control over the formatting of the page. Better to use an HTML::Parser-derived module to pull the HTML data into Perl, and then work with the Perl data structure to get what you need.
Re: Sucking down a file
by ichimunki (Priest) on Jan 31, 2001 at 03:14 UTC
    Unreliably {grin}. I realize you are probably extracting from a consistent source, but webpages don't necessarily have \n delimiters in logical places, and some sites may have \r in them as well. I'd suggest using something like HTML::TokeParser. It will break the page up into tokens which consist of start tags, end tags, and text tags (and a few other things you probably don't care about). You can easily grab and discard tokens until you get to the one that matches your criterion, then start processing from there.
Re: Sucking down a file
by Vane (Novice) on Jan 31, 2001 at 03:57 UTC
    Yet another approach: Don't break the content into lines. while ($html =~ m/.../gis) { push @pos, pos $html; } #see the pos doc.
Re: Sucking down a file
by YaRness (Initiate) on Jan 31, 2001 at 22:07 UTC
    i dunno if this quite applies, but i work with logs that i usually pump into arrays. these logs have a lot of garbage, so i wrote a quick subroutine to basically shift off lines i don't care about until i find what i'm looking for.

    so assuming you have no problem extracting the text you wanna look at, you can use something like this:
    sub shift_until($\@) { #usage: shift_until($somepattern, @list); #this takes a @list and throws away lines until it hits $pattern #$pattern can be a regexp. #if $pattern isn't found, the list is emptied completely. #it's good for parsing through garbage #(hey, i never said it would be useful to everybody) my $pattern = shift; my $array_ref = $_[0]; while (@$array_ref and not $$array_ref[0] =~ /$pattern/) { shift @$array_ref; } (@$array_ref and return 1) or return 0; }
    this could easily be modified to non-destructively seek to a position in the array, or to use a file instead of an array, ad nauseum.