Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

...but I'm tying myself in knots trying to find a nice way to split a string like ThisString or SomeOtherString into (this, string) or (some,other,string) respectively using the split function. Can anyone enlighten me? Thanks sb

20030910 Edit by jeffa: Changed title from 'I know this is really lame... [290180]'

Replies are listed 'Best First'.
Re: Splitting on uppercase letters
by antirice (Priest) on Sep 09, 2003 at 22:11 UTC

    The secret is in the regular expression you use to split upon. I'm thinking zero-width positive look-ahead assertion but that's just me. Documentation is available at perlre and perlretut. If you're just starting out, perhaps you should check out perlrequick for an overview. But to answer your question:

    #!/usr/bin/perl -wl $,=$"; print split/(?=[A-Z])/,"ThisString"; print split/(?=[A-Z])/,"SomeOtherString"; __DATA__ output: This String Some Other String

    Hope this helps.

    Updated: Just realized that I forgot to set $,. If not set as it is above, then the output would not have had spaces.

    antirice    
    The first rule of Perl club is - use Perl
    The
    ith rule of Perl club is - follow rule i - 1 for i > 1

      antirice's advice is a great way of doing it. But there's only one problem. The original post suggested that the outcome from "ThisString" should be "this" and "string" (note the upper case delimeters have become lower case).

      Not wanting to abandon antirice's efficient advice, I would suggest just doing this:

      print map lc split/(?=[A-Z])/,"ThisString"; print map lc split/(?=[A-Z])/,"SomeOtherString"; __DATA__ output: this string some other string

      Dave

      "If I had my life to do over again, I'd be a plumber." -- Albert Einstein

Re: Splitting on uppercase letters
by Mr. Muskrat (Canon) on Sep 09, 2003 at 22:11 UTC

    The problem is that while you can split on /[A-Z]/ the output is not what you want (i.e. split/[A-Z]/, "SomeOtherString" will give you '','ome','ther','tring'). First, add a space or other character in front of the uppercase letters then split on that new character.

    my $string = 'SomeOtherString'; $string =~ s/([[:upper:]])/ $1/g; my @words = split' ',$string; # @words = ('Some', 'Other', 'String');

    Update: Or if you are up to it, follow antirice's advice about zero-width positive look-ahead assertions.

Re: Splitting on uppercase letters
by Cody Pendant (Prior) on Sep 09, 2003 at 22:50 UTC
    $camel_case_string = 'SomeOtherString'; while ($camel_case_string =~ m/([A-Z][a-z]*)/g){ push(@camel_case_words,$1) } print "@camel_case_words";

    Update: good point, Anonymous Monk. Changed the '+' to '*'.



    ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print

      Test: $camel_case_string = 'ThisTooIsAString';

      Why loop when you can just do it in one stop?

      $camel_case_string = 'SomeOtherString'; @camel_case_words = $camel_case_string =~ /([A-Z][a-z]*)/g; print "@camel_case_words";

      You seem to be looping for no reason.

      Anonymously yours,
      Anonymous Monk

Re: Splitting on uppercase letters
by mirod (Canon) on Sep 09, 2003 at 22:23 UTC

    Here are 2 solutions:

    #!/usr/bin/perl -w use strict; my @data= map { chomp; $_ } <DATA>; my @methods=( '@strings= map { lc } split /(?=[A-Z])/, $_', 'while( m{([A-Z][^A-Z]*)}g) { push @strings, lc $1; }', ); foreach my $method (@methods) { print "using $method:\n"; foreach (@data) { my @strings; eval( $method); print " $_ => /", join( "/, /", @strings), "/\n"; } } __DATA__ ThisString SomeOtherString
Re: Splitting on uppercase letters
by ajdelore (Pilgrim) on Sep 09, 2003 at 22:20 UTC

    Updated: This solution is pretty lame, and also fails the test case pointed out by Anonymous Monk. Use antirice's solution.

    Well, I really doubt that this is the nicest way, but it does use split. (Mind you, it needs a regex substitution first...)

    use strict; my @strings = qw (ThisString SomeOtherString); foreach (@strings) { s/([a-z])([A-Z])/"$1:$2"/eg;` my @tokens = split /:/; print join "\n", @tokens; print "\n"; } __END__ # Output: This String Some Other String

    </ajdelore>