Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I am having problems with regex. I want to split the following string:
JohnSmith
by capitals, giving John and Smith. using
m/[A-Z/;
cuts off the capitals Thanks

Replies are listed 'Best First'.
Re: split by a capital followed by
by Roy Johnson (Monsignor) on Dec 19, 2005 at 17:33 UTC
    Use a lookahead:
    my @names = split /(?=[A-Z])/;
    That will give you an undef first item, because it matches at the beginning as well as at the end. Adding a lookbehind will solve that:
    my @names = split /(?<=.)(?=[A-Z])/;

    Caution: Contents may have been coded under pressure.
Re: split by a capital followed by
by halley (Prior) on Dec 19, 2005 at 17:34 UTC
    How to ask questions the smart way.

    I assume you mean you're using split(m/[A-Z]/, $string), but you didn't make that very clear.

    The split() pattern defines what to remove between the chunks you want to keep. Using a m// pattern with captures defines what to keep amongst the rest of the string.

    This method catches a list of capitalized chunks, and is the more natural to me:

    my $name = 'JohnJacobJingleheimer-Schmidt'; my @chunks = ($name =~ m/([A-Z][a-z]*)/g); print $_,$/ for @chunks;

    By using a couple of zero-width assertions, you could use split(), but it will act differently with punctuation cases:

    my $name = 'JohnJacobJingleheimer-Schmidt'; my @chunks = split(m/(?<=[a-z])(?=[A-Z])/, $name); print $_,$/ for @chunks;

    --
    [ e d @ h a l l e y . c c ]

Re: split by a capital followed by
by eric256 (Parson) on Dec 19, 2005 at 17:32 UTC

    Instead of using split you can just use a regular regex.

    my $test = "JohnSmith"; my ($first, $last) = $test =~ /([A-Z][a-z]+)([A-Z][a-z]*)/; print "First: $first, Last: $last\n";


    ___________
    Eric Hodges $_='y==QAe=e?y==QG@>@?iy==QVq?f?=a@iG?=QQ=Q?9'; s/(.)/ord($1)-50/eigs;tr/6123457/- \/|\\\_\n/;print;
Re: split by a capital followed by
by Hue-Bond (Priest) on Dec 19, 2005 at 17:36 UTC

    This may be buggy since I'm not a genius with lookaround assertions but here it goes:

    my $c='JohnSmithSplitMe'; my @c=split /(?<=[a-z])(?=[A-Z])/, $c; print "-@c-\n"; __END__ -John Smith Split Me-

    --
    David Serrano

      thanks for the solutions