in reply to regex: deleting empty (x)html tags
Well after playing with it awhile, I came up with this code:
$self->{'prcssed_txt'} =~ s/<a(\s+[^<>]*[name|id]=[^<>]+>\s*<\/a>) +/<<$1/g; while ($self->{'prcssed_txt'} =~ s/<([^<>]+)(\s+[^<>]+)*>\s*<\/\1> +\n?//) { } $self->{'prcssed_txt'} =~ s/<</<a/g;
It more or less treats a id & a name as a special case; munges them slightly so the empty tag stripper doesn't get them, and then unmunges them.
Since they are a special case, I think this is a reasonable way of handling this, and doesn't add the complexities of either doing this at the same time I'm parsing the html (not exactly a simple process, even without that, since the spec calls for parsing very broken html) or adding a whole second pass though a parser.
|
|---|