in reply to regex: deleting empty (x)html tags
A problem you haven't shown that occurs with context sensitive languages, is if (b)(i)(/b)(/i) is valid then fail. I know I know, it's not what you were asked. But the context in which how things are used in relation to everything else. You MAY be better off doing something like...
I haven't run the code, but you get the idea. This program theoretically should figure out the balancing of tags, probably what is most fragile about your program. But somewhere in here, you should be able to do empty content.sub figureOut { while(my $text=~s/(<.?*>)/) { my $tag = $1; if( $tag=~s/\// ) { my matchTag = pop(@tags); die('Bad HTML'); if( $1 ne $matchTag ); } else { push(@tags,$1) figureOut($text); } } }
Anyway, regular expressions, have limited scope in terms of context. They can tell if text has things in a certain order, but not if those things are in order depending on their context. Perl's re's can do it to some degree, but it's no where complete... like tag balancing.
Update: Use the power of english in the first paragraph.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: regex: deleting empty (x)html tags
by CrysC (Novice) on Feb 14, 2004 at 23:04 UTC |