space before and after

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: space before and after by graff (Chancellor) on Sep 03, 2009 at 08:13 UTC
First you said "... remove the space between and after the tags." Then later on you said "Remove the spaces in front of tags and after the tags." Those are two different things, but neither of them is such a good idea, frankly. As ikegami said in his first reply, browsers always collapse consecutive white-space characters in html when rendering the text, so mucking with space characters in an html file is really unnecessary (from the point of view of someone reading the text in a browser). If you think about html tags for a little bit, you'll notice that some of them (like `<p> <table> <blockquote> <br/>` and so on) are designed to control how browsers apply white-space when rendering html text (i.e. how they add spacing to enforce things like word separation, line breaks and indenting), while others (like `<div> <span> <form> <input>` and so on) have no impact on (do not add or control) spacing at all. So, a process that blindly removes space characters that are adjacent to all tags is very likely to cause some damage to the text (from the point of view of someone trying to read it in a browser), because for some of those tags (div, span, form, etc), the space(s) next to the tag might be the only basis for separating two words that surround it. If you think you have some other important reason for doing this (unrelated to what browsers normally do), it would help if you explain that. Depending on why you really want to do this, it's likely that you'll need to use one of the HTML parsing modules (e.g. HTML::Parser), and you'll need to be fairly careful about deciding which spaces to remove and which to keep. (updated to add a couple words that were missing)	[reply] [d/l] [select]
Re: space before and after by ikegami (Patriarch) on Sep 03, 2009 at 04:53 UTC
Did you mean to keep the space before the `<span>` tag? HTML collapses multiple spaces into one, so what's the point? It'll only serve to introduce errors.	[reply] [d/l]
Re^2: space before and after by Anonymous Monk on Sep 03, 2009 at 04:58 UTC
Remove the spaces in front of tags and after the tags.	[reply]
Re^3: space before and after by ikegami (Patriarch) on Sep 03, 2009 at 05:30 UTC
So the output is wrong, then. `s/\s+(?=<)//g; s/(?<=>)\s+//g;` [download] Assumes no unescaped "<" and ">" in flow or other content. Assumes no "<" and ">" in attribute values, comments or in CDATA sections. Assumes no NET tags (a poorly supported SGML construct). Disregards the fact that this changes the HTML to something that's not equivalent.	[reply] [d/l]
Re^4: space before and after by Anonymous Monk on Sep 03, 2009 at 08:11 UTC
Re^5: space before and after by ikegami (Patriarch) on Sep 03, 2009 at 14:23 UTC
Re^5: space before and after by Anonymous Monk on Sep 03, 2009 at 08:45 UTC