in reply to space before and after
As ikegami said in his first reply, browsers always collapse consecutive white-space characters in html when rendering the text, so mucking with space characters in an html file is really unnecessary (from the point of view of someone reading the text in a browser).
If you think about html tags for a little bit, you'll notice that some of them (like <p> <table> <blockquote> <br/> and so on) are designed to control how browsers apply white-space when rendering html text (i.e. how they add spacing to enforce things like word separation, line breaks and indenting), while others (like <div> <span> <form> <input> and so on) have no impact on (do not add or control) spacing at all.
So, a process that blindly removes space characters that are adjacent to all tags is very likely to cause some damage to the text (from the point of view of someone trying to read it in a browser), because for some of those tags (div, span, form, etc), the space(s) next to the tag might be the only basis for separating two words that surround it.
If you think you have some other important reason for doing this (unrelated to what browsers normally do), it would help if you explain that. Depending on why you really want to do this, it's likely that you'll need to use one of the HTML parsing modules (e.g. HTML::Parser), and you'll need to be fairly careful about deciding which spaces to remove and which to keep.
(updated to add a couple words that were missing)
|
|---|