tfoertsch has asked for the wisdom of the Perl Monks concerning the following question:

I have a line of words that are delimited by single spaces. No I want to break it up into a list of lines each not longer than say 20 characters. This code almost does it but it eats up the last line, the one that does not match.
@x=/(.{1,20}) /g;
How can I get this last chunk?

Thanks,
Torsten

Replies are listed 'Best First'.
Re: regexp question
by almut (Canon) on Dec 29, 2006 at 11:44 UTC

    Alternatively, you could use the word boundary marker

    /(.{1,20})\b/g

    In contrast to using \s*, this would prevent individual words from being split across lines, which I guess was the idea behind the space you used...

      This does what I wanted:
      my @x=/(\S.{0,19})(?=\s|$)/g;

      Thanks to all
Re: regexp question
by virtualsue (Vicar) on Dec 29, 2006 at 11:16 UTC
    Try replacing the space character in your match with "\s*". The '*' says to match a whitespace character 0 or more times.
      That would also match words of more than 20 chars "splited", making \s* = "".

      If I had to keep this limitation I would do:

      my @a = grep { /^.{1,20}$/ } split " ", $string
Re: regexp question
by swampyankee (Parson) on Dec 29, 2006 at 16:52 UTC

    Have you looked at Text::Autoformat?

    My tendency would be to use split, splitting on the word separator, but that's more a style preference than a substantive issue.

    emc

    At that time [1909] the chief engineer was almost always the chief test pilot as well. That had the fortunate result of eliminating poor engineering early in aviation.

    —Igor Sikorsky, reported in AOPA Pilot magazine February 2003.
Re: regexp question
by siva kumar (Pilgrim) on Dec 29, 2006 at 12:37 UTC
    You can try this
    grep { push(@arr,substr($_,0,19)) } split (" ",$string); print join("\n",@arr);
      I may have missed something, but...
      $string = "now is the time for all good men to come to the aid of thei +r country while the gratuitously extralongwordwithmorethantwentychara +ctersexdtendson and on."; grep { push(@arr,substr($_,0,19)) } split (" ",$string); print join("\n",@arr);
      prints:
      now
      is
      the
      time
      for
      all
      good
      men
      to
      come
      to
      the
      aid
      of
      their
      country
      while
      the
      gratuitously
      extralongwordwithmo ###1
      and
      on.
      
      while tfoertsch's "this works for me"
      while ( $string =~ /(\S.{0,19})(?=\s|$)/g ) { # NB: "=~" here, rather than a simple "=" in original. push(@arr, $1); } for my $arr(@arr) { print $arr . "\n"; }
      prints:
      now is the time for
      all good men to come
      to the aid of their
      country while the
      gratuitously
      charactersexdtendson  ###2
      and on.
      

      Note that tfoertsch's output does most of what's specified in the OP, BUT both ###1 and ###2 truncate the "extra long word;" one from the head and one from the tail. That's a problem only if the source data can't be relied upon to use words of more ordinary length. In non-technical English, this isn't likely to be a problem, but I wouldn't want to bet on this auf Deutsch or any of the Germanic/Low Countries/Scandanavian languages.

      Update, in light of the estimable swampyankee's comment below: This may be an example of one of the cases swampyankee had in mind when opting for split