Re: Advanced regular expression help

The old wisdom applies: Parsing HTML with regexes is not good. If it's line based, try to parse it line based.

However if you insist on using regexes...

I don't quite get it - do you want the <ul id="ccc">(.*)</ul> part to be optional? If yes, make it optional: (?:<ul id="ccc">(.*)</ul>)?.

You have to take care that the .* doesn't consume too much text. What do you want the delimiter to be? Newlines? Then use \n or $ or ^ and use the /m modifier.

Also note that . won't match a newline unless the /s modifier is present (more on that in perlre):

Comment on Re: Advanced regular expression help Select or Download Code

Replies are listed 'Best First'.
Re^2: Advanced regular expression help by Andrew Coolman (Hermit) on Sep 12, 2008 at 18:16 UTC
As moritz suggested. Anyway if you want to use regex and just want to see $1 and $3 try to make the optional part capturing since you don't care about $2. Something like this: `my $regex = '<div id="aaaa">([.\w\s]?)(<ul id="ccc">[.\s\w]?</ul>)?( +[.\s\w]?)</div>';` [download] This is the output if that's what you seek: `Text 1 found text tex text more text Text 2 found text text text more text` [download] Regards s++·ą°µ» ¸Â ł¶˝¬ —¬ął. Ş¨µ ş°» ¨µ« ş»¨ą¬ ¶µ °» Ż¶ľ °» ľ¶ą˛ş ¶ą Ż¶Ľąş.}++y~†-Â~?-{~/s$_ee	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: Advanced regular expression help
by Andrew Coolman (Hermit) on Sep 12, 2008 at 18:16 UTC

my $regex = '<div id="aaaa">([.\w\s]*?)(<ul id="ccc">[.\s\w]*?</ul>)?(
+[.\s\w]*?)</div>';
[download]

Text 1 found 
    text tex text

    more text
Text 2 found 
    text text text

    more text
[download]

s++·ą°µ» ¸Â ł¶˝¬ —¬ął. Ş¨µ ş°» ¨µ« ş»¨ą¬ ¶µ °» Ż¶ľ °» ľ¶ą˛ş ¶ą Ż¶Ľąş.}++y~†-Â~?-{~/s**$_*ee

[reply]
[d/l]
[select]