Rodster001 has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, I need a little help with a regex. Say I have this:
<ul> <li> List 1 Item 1 <li> List 1 Item 2 <ul> <li> Sub-Item 1 <li> Sub-Item 2 </ul> <li> List 1 Item 3 </ul> <ul> <li> List 2 Item 1 <li> List 2 Item 2 <ul> <li> Sub-Item 1 <li> Sub-Item 2 </ul> <li> List 2 Item 3 </ul>
What I want is to end up with an @array with List 1 in $array[0] and List 2 in $array[1]. The sub items is what is tripping me up.

Right now I am splitting the string on line breaks and using a foreach loop to find and skip nested items and push them into respective array elements.

My question is: Can I accomplish this same thing with a single regex?

Thanks for your help!

--Rod

Replies are listed 'Best First'.
Re: Regex Nested Matching
by ssandv (Hermit) on Feb 24, 2010 at 20:12 UTC

    First things first: What do you hope to achieve by fitting it all into a single regular expression? The usual results of that are things like puzzling the humans (including yourself months from now) who ultimately have to decipher your code, and causing the regex engine to wander back and forth through your string a lot (which it's happy to do, but not necessarily very efficient at.) If it's logical from the standpoint of reading and maintaining the code to use 2 regexes, use 2 regexes.

    Also, if your data is in a standard markup language, you may well be happier using a library to parse it, rather than hand-rolling a regex to do it.

      Well, this code won't be around for long. I have three complex static html and javascript files I need to parse, stick in the DB and then be done with them (forever). But, I am looking into HTML parsers on CPAN to make this task a little easier. Thanks!

        If it is a single-use script, then that is all the more reason to keep it simple.

        Why try to get creative and write an Obfu regex when you just need to get the job done and then throw it away? Lay it out, don't worry about compactness and just make sure that the code will obviously do what you want it to. (Throwing pre-written HTML parsing modules at it is even better)

        After all, you're going to spend far more time thinking about edge cases and correctness than you will spend typing the code!

Re: Regex Nested Matching
by xyzzy (Pilgrim) on Feb 24, 2010 at 19:09 UTC
    in b4 HTML::Parser

    $,=qq.\n.;print q.\/\/____\/.,q./\ \ / / \\.,q.    /_/__.,q..
    Happy, sober, smart: pick two.
Re: Regex Nested Matching
by drblove27 (Sexton) on Feb 24, 2010 at 19:43 UTC
    I am not a Perl Master, but if your data is really structured like you said you can key of the spacing/tabing for the list items and ignore the sub items using /^\t\<li\> or whatever the spacing is. Clearly this is kludgey, but this can work in a pinch...