Regular Expression - split string by lower/upper case

MiamiGenome has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

Could one of the perl elders kindly provide some insight into this regex question.

I have an array of user-input terms which may be "concatenated" or not (see @terms). I would like the array to look like @separated_terms. Basically just cleave each element if it has an uppercase letter following a lower case.

# Input
@terms = qw/Genetics Genomics phylogeny allele            
            ChromosomeLocusLink geneExpression       
            RasSignalTransductionPathway/;

foreach my $words (@terms) {
        my @wordlist = $words =~ /(?:(.+?[a-z])([A-Z].+))+/g;
}

# Desired Result

@separated_terms = qw/Genetics Genomics phylogeny 
                      allele Chromosome Locus Link 
                      gene Expression Ras Signal
                      Transduction Pathway/;
[download]

---- Here are my attempts, using @wordlist to hold the substrings (several variations omitted) :

# does not work - removes last lowercase and first uppercase letter at
+ each boundary
#       my @wordlist = split /[a-z][A-Z]/, $words;

# does not work - only separates the last term from the list
        my @wordlist = $words =~ /(?:(.+?[a-z])([A-Z].+))+/g;
[download]

Comment on Regular Expression - split string by lower/upper case Select or Download Code

Replies are listed 'Best First'.
Re: Regular Expression - split string by lower/upper case by davidrw (Prior) on Apr 14, 2006 at 18:25 UTC
It's because of the greedy `.+` you have after `[A-Z]` ... just tweaking that to be `[^A-Z]` instead of `.` will work, and gives what you want with that test data. An alternative is to do a substitution, then split on whitespace. `use strict; use warnings; while(<DATA>){ # split regex print join ":", grep length, split /(?:(.+?[a-z])([A-Z][^A-Z]+))/g, +$_; # substition then split method s/(?<=[a-z])(?=[A-Z])/ /g; print join(":", split ' ', $_), "\n"; } __DATA__ Genetics Genomics phylogeny allele ChromosomeLocusLink geneExpression RasSignalTransductionPathway` [download]	[reply] [d/l] [select]
Re^2: Regular Expression - split string by lower/upper case by johngg (Canon) on Apr 14, 2006 at 20:08 UTC
I was going to do a substitution along the lines of `s/([a-z])([A-Z])/$1 $2/g;` but your look-behind/look-ahead is much neater and probably quicker. Something else new I have learned today. Thank you, JohnGG	[reply] [d/l]
Re: Regular Expression - split string by lower/upper case by MiamiGenome (Sexton) on Apr 14, 2006 at 18:51 UTC
THANK YOU!! I used the substitution with positive lookahead and positive lookbehind. Worked like a charm -- not surprising, this is the Perl Monks! Cheers!	[reply]