regexp question

tfoertsch has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: regexp question by almut (Canon) on Dec 29, 2006 at 11:44 UTC
Alternatively, you could use the word boundary marker `/(.{1,20})\b/g` [download] In contrast to using `\s*`, this would prevent individual words from being split across lines, which I guess was the idea behind the space you used...	[reply] [d/l] [select]
Re^2: regexp question by tfoertsch (Beadle) on Dec 29, 2006 at 11:55 UTC
This does what I wanted: `my @x=/(\S.{0,19})(?=\s\|$)/g;` [download] Thanks to all	[reply] [d/l]
Re: regexp question by virtualsue (Vicar) on Dec 29, 2006 at 11:16 UTC
Try replacing the space character in your match with "\s". The '' says to match a whitespace character 0 or more times.	[reply]
Re^2: regexp question by deibyz (Hermit) on Dec 29, 2006 at 11:46 UTC
That would also match words of more than 20 chars "splited", making \s* = "". If I had to keep this limitation I would do: `my @a = grep { /^.{1,20}$/ } split " ", $string` [download]	[reply] [d/l]
Re: regexp question by swampyankee (Parson) on Dec 29, 2006 at 16:52 UTC
Have you looked at Text::Autoformat? My tendency would be to use split, splitting on the word separator, but that's more a style preference than a substantive issue. emc At that time [1909] the chief engineer was almost always the chief test pilot as well. That had the fortunate result of eliminating poor engineering early in aviation. —Igor Sikorsky, reported in AOPA Pilot magazine February 2003.	[reply]
Re: regexp question by siva kumar (Pilgrim) on Dec 29, 2006 at 12:37 UTC
You can try this `grep { push(@arr,substr($_,0,19)) } split (" ",$string); print join("\n",@arr);` [download]	[reply] [d/l]
Re^2: regexp question by ww (Archbishop) on Dec 29, 2006 at 14:31 UTC
I may have missed something, but... `$string = "now is the time for all good men to come to the aid of thei +r country while the gratuitously extralongwordwithmorethantwentychara +ctersexdtendson and on."; grep { push(@arr,substr($_,0,19)) } split (" ",$string); print join("\n",@arr);` [download] prints: now is the time for all good men to come to the aid of their country while the gratuitously extralongwordwithmo ###1 and on. while tfoertsch's "this works for me" `while ( $string =~ /(\S.{0,19})(?=\s\|$)/g ) { # NB: "=~" here, rather than a simple "=" in original. push(@arr, $1); } for my $arr(@arr) { print $arr . "\n"; }` [download] prints: now is the time for all good men to come to the aid of their country while the gratuitously charactersexdtendson ###2 and on. Note that tfoertsch's output does most of what's specified in the OP, BUT both ###1 and ###2 truncate the "extra long word;" one from the head and one from the tail. That's a problem only if the source data can't be relied upon to use words of more ordinary length. In non-technical English, this isn't likely to be a problem, but I wouldn't want to bet on this auf Deutsch or any of the Germanic/Low Countries/Scandanavian languages. Update, in light of the estimable swampyankee's comment below: This may be an example of one of the cases swampyankee had in mind when opting for `split`	[reply] [d/l] [select]