Re: simple regex help

i'd like to not use an html tree module to handle this -- and keep it all in regex. is this possible?

In a word yes. But, imo, very tricky.

You didn't say what exactly you're looking for. Perhaps some examples, including those nested <li>s?

It's very easy to "get at" all the html elements. I'd wager a solution could be found using something like the following.

#!/usr/bin/perl

use strict;
use warnings;
use HTML::TokeParser::Simple;

my $html = do{local $/; <DATA>};

my $p = HTML::TokeParser::Simple->new(\$html)
 or die "can't parse string: $!\n";

while (my $t = $p->get_token){
  printf "*%s*\n", $t->as_is;
}

__DATA__
<li><span class="title">Title</span> MATCH HERE </li>
[download]

output:

*<li>*
*<span class="title">*
*Title*
*</span>*
* MATCH HERE *
*</li>*
*
*
[download]

Comment on Re: simple regex help Select or Download Code

Replies are listed 'Best First'.
Re^2: simple regex help by nmerriweather (Friar) on Apr 18, 2007 at 17:08 UTC
<quote>You didn't say what exactly you're looking for. Perhaps some examples, including those nested s?</quote> Well, anything in the 'match here' -- the content changes. the only given i know, is that the outermost match is this: `<li><span class="title">Title</span> MATCH HERE </li>` [download] MATCH here could be a single letter, or it could be an html structure that potentially matches the regex i really need to keep this in regex if possible -- using the tree objects is a last resort	[reply] [d/l]
Re^3: simple regex help by wfsp (Abbot) on Apr 18, 2007 at 17:19 UTC
I've used a stack to keep track of opening/closing li tags. #!/usr/bin/perl use strict; use warnings; use HTML::TokeParser::Simple; my $html = do{local $/; <DATA>}; my $p = HTML::TokeParser::Simple->new(\$html) or die "can't parse string: $!\n"; while (my $t = $p->get_token){ last if $t->is_end_tag('span'); } my ($match, @li_stack); while (my $t = $p->get_token){ if ($t->is_start_tag('li')){ push @li_stack, 'li'; } if ($t->is_end_tag('li')){ if (@li_stack){ pop @li_stack; } else{ last; } } $match .= $t->as_is; } print "$match\n"; __DATA__ <li><span class="title">Title</span><ul><li>one</li><li>two</li></ul> +MATCH HERE </li> [download] output: `<ul><li>one</li><li>two</li> MATCH HERE` [download] update: Added output. uptdate 2 see ikegami's reply below.	[reply] [d/l] [select]
Re^4: simple regex help by ikegami (Patriarch) on Apr 18, 2007 at 17:52 UTC
`__DATA__ <li><span class="title">Title</span><ul><li>one</ul> MATCH HERE </li> +this shouldn't match` [download] outputs `<ul><li>one</ul> MATCH HERE </li> this shouldn't match` [download] instead of the expected `<ul><li>one</ul> MATCH HERE` [download]	[reply] [d/l] [select]
Re^5: simple regex help by Fletch (Bishop) on Apr 18, 2007 at 17:59 UTC
Re^6: simple regex help by ikegami (Patriarch) on Apr 18, 2007 at 19:48 UTC