This is a continuation to XML::Rules and whitespace handling to inform you that I am (unless there are bugs of course or someone can convince me to add some more options) done with the changes proposed in that node, except that I decided during the implementation that I'll define the options a bit differently:

=head2 Whitespace handling There are two options that affect the whitespace handling: stripspaces and normalisespaces. The normalisespaces is a simple flag +that controls whether multiple spaces/tabs/newlines are collapsed into a single space or not. The stripspaces is more complex, it's a bit-mas +k, an ORed combination of the following options: 0 - don't remove whitespace around tags (around tags means before the opening tag and after the closing ta +g, not in the tag's content!) 1 - remove whitespace before tags whose rules did not return any tex +t content (the rule specified for the tag caused the data of the tag to be i +gnored, processed them already or added them as attributes to parent's \%a +ttr) 2 - remove whitespace around tags whose rules did not return any tex +t content 3 - remove whitespace around all tags 0 - remove only whitespace-only content (that is remove the whitespace around <foo/> in this case "<bar> + <foo/> </bar>" but not this one "<bar>blah <foo/> blah</bar>") 4 - remove trailing/leading whitespace (remove the whitespace in both cases above) 0 - don't trim content 8 - do trim content (That is for "<foo> blah </foo>" only pass to the rule {_conten +t => 'blah'}) That is if you have a data oriented XML in which each tag contains eit +her text content or subtags, but not both, you want to use stripspaces => 3 or stripspaces => 3|4. This will not only make sure y +ou don't need to bother with the whitespace-only _content of the tags with subt +ags, but will also make sure you do not keep on wasting memory while parsin +g a huge XML and processing the "twigs". Without that option the parent tag of the repeated tag would keep on accumulating unneeded whitespace in its + _content.
I did not release the module on the unsuspecting public yet even though the tests seem to pass. I have to admit I did not review the complete huge t\08-whitespace_more.t so it may accept (or rather require) an incorrect behaviour.

I would be most grateful if a few interested people downloaded the new version of the module from http://jenda.krynicky.cz/perl/test/XML-Rules-0.22.tar.gz (or via PPM from http://jenda.krynicky.cz/perl/test/XML-Rules.ppd) and gave it a shot and let me know if anything works unlike they expected.


In reply to XML::Rules and whitespace handling II - DONE? by Jenda

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.