zogness has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks. I am looking for a way to send raw html though a regex filter.
All I need is to extract the content between the fist and last div tags.
use LWP::Simple; use HTTP::Request; use HTML::TreeBuilder 3.0; use Template::Extract; use Data::Dumper; foreach my $urls( 'http://uptimemgr/saved/1358381265931.htm', 'http://uptimemgr/saved/1375561135115.htm', 'http://uptimemgr/saved/1388446037003.htm' ) { getprint $urls =~ m{<div class=ipm8rpt>([^<]*)</div>}i || die;

Replies are listed 'Best First'.
Re: HTTP::Request pipe through regex
by Joost (Canon) on Mar 09, 2007 at 17:00 UTC
Re: HTTP::Request pipe through regex
by SheridanCat (Pilgrim) on Mar 09, 2007 at 17:33 UTC
    What you're saying here is:

    1) Apply this regex to the current value of $urls - or die. Since this is not going to be a match, the die happens. When you ask for a return from a regex in scalar context you get a true or false return. You should be getting falses and thus dying.

    2) If the die() wasn't there, you'd be returning nothing to the getprint() call, in which case it would try to hit that URL, which would return nothing also.

    What you're likely really wanting to do is to get the HTML from the URL and then apply the regex, right? So, in that case, use get() to retrieve the HTML into a variable and then apply your regex to that. Don't forget to use the /s modifier to cross newlines. I suppose you could do it in one line without the temp variable, but it's more clear to use the variable.

Re: HTTP::Request pipe through regex
by andye (Curate) on Mar 09, 2007 at 16:59 UTC
    Hi zogness,

    So - what's your question? Does the code you have not work how you want it to - if so, what's going wrong?

    I haven't used those modules recently myself, but I'm guessing that you need to fetch the page, *then* do the regexp matching and print the result. But maybe not. It's been a while.

    Just from looking at it (haven't tried running it), it looks as though the code you have will try to fetch the result of running the regexp on each of your URL strings - which won't match, so you're asking it to fetch an empty list. Kinda guessing here though, really.

    Best wishes,
    andye

      It does not work. I can get all the html to print to stdout but can't funnel it into the regex.
Re: HTTP::Request pipe through regex
by bart (Canon) on Mar 10, 2007 at 16:49 UTC
    Reordering your code I get something that I assume does something like what you want:
    use LWP::Simple; @divs = map m{<div class=ipm8rpt>([^<]*)</div>}i, map { get $_ } 'http://uptimemgr/saved/1358381265931.htm', 'http://uptimemgr/saved/1375561135115.htm', 'http://uptimemgr/saved/1388446037003.htm' ;
    Of course, the regexp is a pretty weak attempt of capturing whatever is in the div, but if this is HTML your people have produced themselves, it might work well enough.