rovf has asked for the wisdom of the Perl Monks concerning the following question:

Assume I have a string $s with length greater than $ml. I would like to create an array containing the successive pieces of $s, each piece with length $ml (the last piece probably shorter). For example:

$s='abcdefg'; $ml=3; @result=('abc','def','g');
Restrictions: Must run under perl 5.8 without additional modules (CPAN) installed.

The first thing which came to my mind, was to use split to turn the string into an array of characters,

@sc=split(//,$s);
and then, using a loop, use Arrayslices to extract the individual parts of the string, i.e. inside a loop something like:
push @result,join('',@sc[$i*$ml .. $i*($ml+1)-1]);
But even if we ignore for a moment the problem that the last slice would be out-of-bounds (which is trivial to solve), this solution is terribly ugly. Is there a better way to do it? Maybe with a regexp containing .{1,$ml}, but I don't see how to use this to build up my resulting array (I would need kind of a "generator"). Any ideas?

-- 
Ronald Fischer <ynnor@mm.st>

Replies are listed 'Best First'.
Re: chopping a string into slices - is there a more elegant way to do it?
by JavaFan (Canon) on Nov 04, 2008 at 12:44 UTC
    //g is a perfect generator.
    $ml = 3; @a = "abcdefg" =~ /.{1,$ml}/g; say "@a"; __END__ abc def g
      /.{1,$ml}/sg if newlines are possible.

      Thanks a lot to all of you for all the solutions provided. I think I will go for the //g version - I had completely forgotten that I can use the g modifier also with pattern matching, not only with pattern substitution. Really neat and compact solution!

      -- 
      Ronald Fischer <ynnor@mm.st>
      talking about elegance

      IMHO additional parentheses improve the readability, emphasizing that the match-op is evaluated in a list-context. *

      @a = ( "abcdefg" =~ /.{1,$ml}/g ) ;
      or alternatively an extra newline
      @a = "abcdefg" =~ /.{1,$ml}/g ;
      well maybe a matter of taste ...

      * UPDATE: well it's not evaluated in list-context, but it empasizes the order of evaluation, making it IMHO more readable..

        IMHO additional parentheses improve the readability, emphasizing that the match-op is evaluated in a list-context.
        That's a really bad reason to use parenthesis, because they do NOT provide list context. Compare:
        $a = ("abcdefg" =~ /.{1,$ml}/g);
        Here parenthesis are used, but the match is in scalar context.
Re: chopping a string into slices - is there a more elegant way to do it?
by BrowserUk (Patriarch) on Nov 04, 2008 at 13:00 UTC

    If you're running a recent verion of Perl (>5.8.0 if memory serves), the I'd go with unpack per ccn, but use the parens template syntax:

    print for unpack '(a3)*', 'the quick brown fox jumps over the lazy dog +';; the qu ick br own fo x j ump s o ver th e l azy do g

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      I wouldn't use unpack unless you mean to work with bytes (encoded characters?) instead of characters.

      use utf8; use open ':std', :locale'; my $s = "fête"; print("$_\n") for unpack '(a3)*', $s;

      prints 5 characters from the 4 given.

        unless you mean to work with bytes (encoded characters?)

        I don't "mean to work with" anything--ask the OP. I simply sought to enhance ccn's suggestion by pointing out the (..)* template syntax.

        And wtf do you think you mean by (encoded characters?). All characters are encoded. It's just a case of how they are encoded. Luckily for me, at least 80% (probably closer to 98%) of the data I deal with, characters == bytes. So, following the 80/20 rule, I only expend effort to deal with the 2% when it is required.

        And, as your snippet demonstrates, in order to get strings into perl that are anything other that byte-encoded, the programmer has to take (one or more) extraordinary steps. And if the OP is aware enough of his data to know that s/he needs to take those extraordinary steps, then they are

        • probably aware enough to recognise immediately that the (ccn's) suggestion, to use unpack is not going to work for their situation;
        • Or: not aware enough to recognise the limitation, but will quickly discover it when they test the suggestion out in their environment;
        • Or: will quickly discover the problem, when they start getting garbage output from their program.

        If the unpack solution does not work for them in their environment, whichever way they become aware of it, it will either be a split second of thought for them to dismiss it as a possibility; or a really, really, good lesson for them to learn.

        You seem to believe that you can either:

        • predict from the OPs question, every intimate detail of their operating environment and so can tailor your responses to their questions such that no matter how naive they are, your solutions will always work, in their environment.
        • Or; always produce responses (code) that will always function perfectly regardless of the OPs operating environment.

        And that attitude is either:

        • Arrogance beyond countenance.
        • Stupidity beyond belief.

        To summarise: It is impossible to predict the full details of the OPs operating environment, and it will (has!) taken (far more than) 20 questions to interrogate it. So, in a forum such as this, where 20 questions could take anything from 2 days, to 2 weeks, to two times the twelfth of never to ask/be answered; I prefer to grant the OP the assumption of intelligence; rather than assume their stupidity.

        Any presumption on your behalf to assume that you can produce code, in response to OPs questions, that will operate perfectly in their environment without their having exercised any modicum of locality awareness or data knowledge, is totally unfounded. And if there is even a 1% percent chance that the code you produce will not operate in their environment without their testing it, then all your efforts and pedantry are wasted; and no better than my "assume the OP knows what they are doing" responses.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: chopping a string into slices - is there a more elegant way to do it?
by ccn (Vicar) on Nov 04, 2008 at 12:48 UTC

    That's work for unpack

    $s = 'abcdefg'; $ml = 3; @result = unpack 'A3A3A3', $s;
Re: chopping a string into slices - is there a more elegant way to do it?
by dragonchild (Archbishop) on Nov 04, 2008 at 14:25 UTC
    You've received help with your actual problem. But, I will point out that the "No CPAN modules" restriction is a restriction only in your mind. The code is freely available - just cut'n'paste the subroutines you care about into your code. Attribute the original author if you have a conscience (and can do so in your shop) and move on. There have been several times where I've had to do that in my career.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: chopping a string into slices - is there a more elegant way to do it?
by gw1500se (Beadle) on Nov 04, 2008 at 14:30 UTC
    I'm not a regexp expert but if you can come up with the regexp for getting $ml characters at a time then you can set the array with:
    my @result=$s=~/<some regexpr>/;
    Is that what you were looking for?