Hola monks,
I'm about to make some changes to XML::Rules regarding the whitespace handling and would like your suggestions. (Update: lot's ++s, no comments, so I guess the design and the docs are both perfect ;-) I'll implement it and write some tests, hopefully if there is any other sensible level of spaces stripping I'll find it during that. So that I do not end up with stripspaces => 2.5 ;-))

The problem

Currently the module doesn't make any changes to the data it reads and it's up to you what do you do to them in the tag rules you specify. This is not a problem for things like <foo>   bar  </foo>, if you do not want the whitespace you can remove it by a rule like this foo => sub {for ($_[1]->{_content}) {s/^\s+//;s/\s$//;} return $_[0] => $_[1]->{_content}}} and there actually is a predefined rule that does exactly that.
The problem is that this may be too late. Imagine you have a huge XML like this:

<root> <tag>... some subtags and content and whatnot ...</tag> <tag>... some subtags and content and whatnot ...</tag> <tag>... some subtags and content and whatnot ...</tag> ... </root>
and you want to process each <tag>, do something with the data inside and forget it. Similar to the way XML::Twig works. This works fine, you read a chunk, you process the chunk, you forget the chunk ... but the memory fotprint still grows. Because all that whitespace between the <tag>s gets accumulated into <root>'s _content. And only after the </root> is parsed the rule for <root> will throw it away. Generally in a data oriented XML, where all tags contain either characters or subtags but not both, you could not care less about the whitespace inside the tags that contain subtags. On the other hand in a document oriented XML you have to be more carefull about them.

The solution

Currently I'm thinking about these two new options:

=head2 Whitespace handling stripspaces => 0 (default) All whitespace should be preserved. stripspaces => 1 remove the whitespace-only content preceeding a tag whose rule did + not return anything in a way that would cause it to be added to parent +'s _content and whitespace between such a tag and the closing parent +tag. This means that no whitespace will be removed before <foo> in "<bar>blah <foo x="y"/></bar>" but (if the rule for <foo> returns +nothing or an even-numbered list of things to be added to the parent's %$a +ttr) both whitespaces will be removed in "<bar> <foo x="y"> </bar>" +. stripspaces => 2 remove all whitespace before tags that do not add anything into pa +rent's _content and before the closing tag. stripspaces => 3 as => 2, plus trims the _content of tags and doesn't include the _ +content for tags like "<foo x="y"></foo>" and "<foo x="y"> </foo>". normalisespaces => 0 do not do anything with multiple spaces in a row normalisespaces => 1 collapse multiple spaces in a row to a single one

I'll be gratefull for both suggestions regarding the documentation wording and examples and suggestions for other posibilities of whitespace handling you might find handy.