This thought came up as a result of an Anonymonk reply at Re: Re: Re: Re: What's the most efficient way to write out many lines of data?.

When unpacking fixed length records contains ascii data in fixed width fields using the 'A' (and 'Z') formats, perl DWIM's rather nicely by stripping trailing spaces from each field. It being quite normal to right pad shorter data elements within fixed width fields.

This is great as it saves a expensive call into the regex engine to do this which can really help performance on large datasets like the one in that thread.

The problem comes when the ascii data in the fields is numeric, which are traditionally right-justified and left-padded. Under these circumstances it would be nice if unpack would DWIM and strip the leading spaces for me.

My first thought was that if the first character is non-blank then trailing spaces would be stripped as now, but if the leading character is blank and the last character is not, then leading spaces would be stripped.

I think given the current behaviour, that this addition would to the existing template chars 'A' and 'Z' would be acceptable for unpacking but it does risk breaking existing applications. It also leaves the situation that unpacking would handle left or right padded fields, but that packing would only handle right-padding.

So my current thought is that a new template char ('R' seems to be free and is somewhat pneumonic) should be added to allow for Right-justified fields. This would be identical to 'A' except that on packing, this would left pad with spaces to the specified width and on unpacking would strip leading spacing.

This would still leave 'Z' with no equivalent for right-justifying with nulls rather than spaces, but I can't actually think of any time when this is used or would be useful?

Is this a good idea? Is there a better way to implement this?

I intend to try and work up a patch to achieve this as that is the preferred method of making "feature requests", but I'm still floundering in my experiments with modifying the Perl sources, so it might take me a while. If anyone else feels like doing this and submitting the patch, I won't object.

Also, if anyone can think of anything that this might affect or has insights on a good regression test I'd be interested to hear them.

As soon as I manage to locate a copy of 5.8.1-RC2 I'll have a go.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Replies are listed 'Best First'.
Re: RFC: A (minor) new perl feature.
by adrianh (Chancellor) on Jul 11, 2003 at 17:56 UTC

    I don't think it is a good idea since it's going to break any code in the real world that's currently treating leading spaces as significant. - Ignore me, managed to miss the paragraph about the "R" flag. D'oh!

    Is there a common need to trim leading spaces on numbers? Just using the scalar in a numeric context will do that for you.

    my $foo = " 100"; print "!", $foo, "!\n"; print "!", 0+$foo, "!\n"; __END__ # produces ! 100! !100!

      I'll skip the bit about "code in the real world that treats leading spaces as significant" as hooky and getting what it deserves. (Yes. Its a joke:)

      In the application that started the thought, the OP is reformatting data from fixed width fields to CSV. Leading spaces on numerics probably won't harm, but if nothing else they consume disc space. That aside, if you ever processed legacy data from COBOL ot FORTRAN apps, then its not uncommon for financial data to be right justified to align decimal points, but the the data is prefixed with a currency symbol.

      $103.12 $10312.00

      It would be easier to deal with if the leading spaces were stripped.

      Its also not that uncommon to right justify string fields.

      I don't expect that the feature would set the world on fire, but it seems like it would be as useful as several other fairly obscure features available with pack: uuencoding, checksum calculation, BER compressed integers, c/a formats etc.

      All of these things are fairly trivial to do yourself in perl, but if you have the need for them and the need for them to operate efficiently, having them run at the C-level rather than in perl is a god-send.

      From my preliminary investigation, the additional code required is fairly minimal. The biggest cost would probably be the extra documentation I think :) Perhaps the biggest benefit would be the ability to efficiently pack large arrays of data (numbers or text), right-justified for passing on to systems and software that expect this.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


        I'll skip the bit about "code in the real world that treats leading spaces as significant" as hooky and getting what it deserves. (Yes. Its a joke:)

        Bah! Now the world knows I'm an idiot who cannot read ;-)

        The examples you gave sound sane. Go for it.

Re: RFC: A (minor) new perl feature.
by demerphq (Chancellor) on Jul 12, 2003 at 17:14 UTC

    You probably actually want to do this against bleadperl and not 5.8.1-RC2. If the pumpkings think its worth being backward applied to RC2 (im betting they wont) then they will do so, but in blead you have a much better chance of getting it applied. But it does sound like an interesing idea.

    Good luck.


    ---
    demerphq

    <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...
Re: RFC: A (minor) new perl feature.
by diotalevi (Canon) on Jul 12, 2003 at 00:22 UTC

    pneumonic
    Pick one: euphonic, mnemonic.

      Pick one: euphonic, mnemonic.

      No need to be sardonic. 'Tis just a node, 'tis not canonic! Unless you have something nice to say, be aphonic. It's better to be harmonic (or even catatonic) than chronically anionic.

      You know he meant "mnemotechnic"...

      -sauoq
      "My two cents aren't worth a dime.";
      

      Rather than mnemonic, I guess that is (an) oldmoanic.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


      pneumonic:
          "relating to the lungs. Relating to, affected by, or similar to pneumonia."

      Whats wrong with that then? <grin>

      --
      Do not seek to follow in the footsteps of the wise. Seek what they sought. -Basho