in reply to Re: simple regex help
in thread simple regex help

<quote>You didn't say what exactly you're looking for. Perhaps some examples, including those nested
  • s?</quote>

    Well, anything in the 'match here' -- the content changes. the only given i know, is that the outermost match is this:

    <li><span class="title">Title</span> MATCH HERE </li>

    MATCH here could be a single letter, or it could be an html structure that potentially matches the regex

    i really need to keep this in regex if possible -- using the tree objects is a last resort

  • Replies are listed 'Best First'.
    Re^3: simple regex help
    by wfsp (Abbot) on Apr 18, 2007 at 17:19 UTC
      I've used a stack to keep track of opening/closing li tags.
      #!/usr/bin/perl use strict; use warnings; use HTML::TokeParser::Simple; my $html = do{local $/; <DATA>}; my $p = HTML::TokeParser::Simple->new(\$html) or die "can't parse string: $!\n"; while (my $t = $p->get_token){ last if $t->is_end_tag('span'); } my ($match, @li_stack); while (my $t = $p->get_token){ if ($t->is_start_tag('li')){ push @li_stack, 'li'; } if ($t->is_end_tag('li')){ if (@li_stack){ pop @li_stack; } else{ last; } } $match .= $t->as_is; } print "$match\n"; __DATA__ <li><span class="title">Title</span><ul><li>one</li><li>two</li></ul> +MATCH HERE </li>
      output:
      <ul><li>one</li><li>two</li> MATCH HERE
      update:
      Added output.

      uptdate 2
      see ikegami's reply below.

        __DATA__ <li><span class="title">Title</span><ul><li>one</ul> MATCH HERE </li> +this shouldn't match

        outputs

        <ul><li>one</ul> MATCH HERE </li> this shouldn't match

        instead of the expected

        <ul><li>one</ul> MATCH HERE

          And that, class, is why all sane people use a properly tested HTML parser and don't try to roll their own with regexen . . .

          Update: Oh he is. Never mind me . . . %) Perhaps this is why sane people avoid having to parse HTML if they can avoid it. :)