http://qs1969.pair.com?node_id=491903


in reply to Splitting strings into words when there are no separators

Your employer (or whoever) should consider relaxing to http://green-brick.foobar and http://foo.bar/green-brick. A hyphen is perfectly valid in a server name, and saves you a lot of trouble.

However, there are ways to do what you ask. Here's a very naive approach:

#!/usr/bin/perl -l use strict; use warnings; # build word list in %words my %words; open my $dict, "/usr/dict/words"; chomp, $words{lc $_} = 1 while <$dict>; # UPDATE: added 'lc' close $dict; my $str = "perfumesmellslikecheese"; $str =~ m{ ^ # anchor to beginning of string (?{ [ ] }) # start $^R as an empty array ref (?: # match this block << (\w{2,}) # capture 2 or more letters to $1 (?(?{ $words{lc $1} }) # if lowercase '$1' is in %words... (?{ [ @{$^R}, $1 ] }) # add this word to the current list | # otherwise... (?!) # fail (force \w{2,} to backtrack) ) )+ # >> one or more times $ # anchor to end of string (?{ print "@{$^R}" }) # print the words (with spaces) (?!) # fail (cause everything to backtrack) }x;
You can make the engine a great deal smarter by making it dynamically adjust -- making it only possible to match things you KNOW to be words, for example.

Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart