Re: regex: deleting empty (x)html tags

Don't try to solve problems like this with regexes. While I won't claim it's impossible, it's certainly not easy, efficient or maintainable. Your regex is going to be long, and full of special constructs.

Solve this using an HTML parser. Once you have a parse tree, removing empty elements is trivial.

Abigail

Comment on Re: regex: deleting empty (x)html tags

Replies are listed 'Best First'.
Re: Re: regex: deleting empty (x)html tags by CrysC (Novice) on Feb 15, 2004 at 00:23 UTC
I've concidered sending it back though HTML::PullParser a second* time* to remove the empty tags, but that seems like quite a bit of excess processing simply to remove empty tags. I could be wrong though -- just because it will take on the order of 20 times as much code as the regex doesn't mean it will actually be slower. Edit: This is document fragment with no containing tag, so any of the tree-based parsers will barf afaik Edit2: I'm not refusing to concider non-regex solutions, it's just that loading yet another parser or going though HTML::PullParser again doesn't seem to be a very efficent way of doing it...	[reply]

Replies are listed 'Best First'.

Re: Re: regex: deleting empty (x)html tags
by CrysC (Novice) on Feb 15, 2004 at 00:23 UTC

I've concidered sending it back though HTML::PullParser a second time to remove the empty tags, but that seems like quite a bit of excess processing simply to remove empty tags.

I could be wrong though -- just because it will take on the order of 20 times as much code as the regex doesn't mean it will actually be slower.

Edit: This is document fragment with no containing tag, so any of the tree-based parsers will barf afaik

Edit2: I'm not refusing to concider non-regex solutions, it's just that loading yet another parser or going though HTML::PullParser again doesn't seem to be a very efficent way of doing it...

[reply]