Howdy,
I’m in the process of writing what could be considered a poor mans version of
Everything and have finally wandered around to the section dealing with submitted HTML stuff (a.k.a Node editing).
While I’m fairly sure I can fiddle around with modules such as HTML:: to sift out any tags outside of the relatively safe ones (eg. Br,hr,p,b,strong and & chars), I am unsure as how to proceed with the whole issue of unmatched tags.
I know there are some HTML tricks that’ll allow the later series of browsers to ‘overlook’ such nasties as unbalanced tags but I would prefer to keep it safe and produce nice clean HTML 1.+ type code.
My current thinking on the subject is going along the lines of creating a small hash to hold a ‘level count’ for each tag, adding or subtracting from the count through the parse process and then dumping a series of close tags for any tags that still appear ‘open’.
This approach, IFAIK, will work fine for the simpler tags, where overlapping is okay but I’m worried about what happens if I ever decide to progress to more complex tags, such as table handing, where the order of closing is important.
Is there a super dooper HTML::Parse->CloseAllDemTagsProperly call that I’ve missed?
In the process of producing the PM site and the like, has somebody refined a handstrung routine to the point where it does everything short of write the legal notice?
Thank you for your Infomercial time.