This thought came up as a result of an Anonymonk reply at Re: Re: Re: Re: What's the most efficient way to write out many lines of data?.

When unpacking fixed length records contains ascii data in fixed width fields using the 'A' (and 'Z') formats, perl DWIM's rather nicely by stripping trailing spaces from each field. It being quite normal to right pad shorter data elements within fixed width fields.

This is great as it saves a expensive call into the regex engine to do this which can really help performance on large datasets like the one in that thread.

The problem comes when the ascii data in the fields is numeric, which are traditionally right-justified and left-padded. Under these circumstances it would be nice if unpack would DWIM and strip the leading spaces for me.

My first thought was that if the first character is non-blank then trailing spaces would be stripped as now, but if the leading character is blank and the last character is not, then leading spaces would be stripped.

I think given the current behaviour, that this addition would to the existing template chars 'A' and 'Z' would be acceptable for unpacking but it does risk breaking existing applications. It also leaves the situation that unpacking would handle left or right padded fields, but that packing would only handle right-padding.

So my current thought is that a new template char ('R' seems to be free and is somewhat pneumonic) should be added to allow for Right-justified fields. This would be identical to 'A' except that on packing, this would left pad with spaces to the specified width and on unpacking would strip leading spacing.

This would still leave 'Z' with no equivalent for right-justifying with nulls rather than spaces, but I can't actually think of any time when this is used or would be useful?

Is this a good idea? Is there a better way to implement this?

I intend to try and work up a patch to achieve this as that is the preferred method of making "feature requests", but I'm still floundering in my experiments with modifying the Perl sources, so it might take me a while. If anyone else feels like doing this and submitting the patch, I won't object.

Also, if anyone can think of anything that this might affect or has insights on a good regression test I'd be interested to hear them.

As soon as I manage to locate a copy of 5.8.1-RC2 I'll have a go.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller



In reply to RFC: A (minor) new perl feature. by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.