in reply to Multiple Multiline Regexps?

This might get you started:

#! /usr/bin/perl use strict ; use warnings ; $|++ ; my $data = qq{ [...] <body> [...random stuff...] <li>headline one</li> <br> <p>the story</p> [...random stuff...] <li>headline two</li> <br> <p>the next story</p> [...random stuff...] <body> } ; while ( $data =~ s{<li>(.*?)</li>.*?<p>(.*?)</p>}{}s ) { print "Headline: $1\nStory: $2\n\n" ; } __END__

That is, of course, assuming that the only use for <li> and <p> are only used for headlines and stories. IMO, the more restrictive you can make this regexp, the better.

Update: This is probably better done with a proper parser. I've never used it, but HTML::Parser might be a good option.


_______________
D a m n D i r t y A p e
Home Node | Email

Replies are listed 'Best First'.
Re: Re: Multiple Multiline Regexps?
by Bird (Pilgrim) on Jul 25, 2002 at 18:57 UTC
    Agreed, I'd be as restrictive as possible. I'd even add the \n<br>\n portion to the regex. Something like this (which also just matches, instead of substituting)...

    while ( $data =~ m{<li>(.*?)</li>\n\s*<br>\n\s*<p>(.*?)</p>}g )

    -Bird

    p.s. The \s* assertions are in there to deal with leading spaces. I don't know if there are any in your data, but DamnDirtyApe had some in his code.