CrysC has asked for the wisdom of the Perl Monks concerning the following question:
I'm working on a html parser for a cms project and need to delete empty html tags recursively (as in <p><i></i></p> would all disappear) except if the tag is <a name="..."> or <a id="...">
I've got one that works but my code seems far too fragile for my taste.
while ($self->{'prcssed_txt'} =~ s/<([^<>]+)([\s]+[^<>]+)*>\s*<\/\1>\n?//) { }
This first attempt strips all empty tags.
while ($self->{'prcssed_txt'} =~ s/<([^<>a][^<>]*)([\s]+[^<>]+)*>\s*<\/\1>\n?//) { } This one strips all empty tags that don't start with a.
while (($self->{'prcssed_txt'} =~ s/<([^<>a][^<>]*)([\s]+[^<>]+)*>\s*< +\/\1>\n?//) || ($self->{'prcssed_txt'} =~ s/<a href[^<>]*>\s*<\/a>//) || ($self->{'prcssed_txt'} =~ s/<abbr[^<>]*>\s*<\/abbr>//) || ($self->{'prcssed_txt'} =~ s/<acronym[^<>]*>\s*<\/acron +ym>//)) { }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: regex: deleting empty (x)html tags
by Abigail-II (Bishop) on Feb 14, 2004 at 23:39 UTC | |
by CrysC (Novice) on Feb 15, 2004 at 00:23 UTC | |
|
Re: regex: deleting empty (x)html tags
by exussum0 (Vicar) on Feb 14, 2004 at 22:52 UTC | |
by CrysC (Novice) on Feb 14, 2004 at 23:04 UTC | |
|
Re: regex: deleting empty (x)html tags
by jeffa (Bishop) on Feb 15, 2004 at 15:31 UTC | |
by CrysC (Novice) on Feb 15, 2004 at 17:46 UTC | |
by jeffa (Bishop) on Feb 15, 2004 at 18:01 UTC | |
by CrysC (Novice) on Feb 15, 2004 at 20:37 UTC | |
|
Re: regex: deleting empty (x)html tags
by CrysC (Novice) on Feb 15, 2004 at 03:16 UTC |