Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am pretty bad w/ regexes and have the following
my $foo = "Hello WorldFoo BarPerl MonksSlash Dot"; for ($foo) { for (split /[a-z][A-Z]/) { print "$_ \n"; } }
And the result is obviously not what I want, given that is removing the last letter and the initial. How do I accomplish the split w/o removing the letters? Thanks.

Retitled by davido from 'regex'.

Replies are listed 'Best First'.
Re: Split without removing characters
by Tanktalus (Canon) on Jun 17, 2005 at 19:49 UTC
    $ perl -e 'my $foo = "Hello WorldFoo BarPerl MonksSlash Dot"; my @x = +split /(?<=[a-z])(?=[A-Z])/, $foo; print map("[$_]",@x),$/' [Hello World][Foo Bar][Perl Monks][Slash Dot]

    Using look-behind and look-ahead tags, you end up matching a zero-width point between characters as your split point.

    (PS - "bad at regexps"? I find look-behind and look-ahead are relatively rarely used, so it's just a matter of experience - and my experience on this has solely been answering questions here ;->)

Re: Split without removing characters
by cmeyer (Pilgrim) on Jun 17, 2005 at 19:54 UTC

    Try using the zero-width look-behind and look-ahead matches:

    @stuff = split /(?<=[a-z])(?=[A-Z])/;

    This means that the regex matches at a point where there is a lowercase letter immediately preceding, and an uppercase letter immediately following, but neither of these letters are consumed by the match. More info: perlre

    -Colin.

    WHITEPAGES.COM | INC

Re: Split without removing characters
by GrandFather (Saint) on Jun 17, 2005 at 23:50 UTC

    Another way to do it:

    my $foo = "Hello WorldFoo BarPerl MonksSlash Dot"; print join (" \n", split (/(?<=[a-z]| )(?=[A-Z])/, $foo));

    Perl is Huffman encoded by design.
Re: Split without removing characters
by djohnston (Monk) on Jun 17, 2005 at 21:16 UTC
    Here's how I'd go about it, although it doesn't make use of split:
    my $foo = "Hello WorldFoo BarPerl MonksSlash Dot"; print "$_ \n" for ($foo =~ /([A-Z][a-z]+)/g); __OUTPUT__ Hello World Foo Bar Perl Monks Slash Dot
Re: Split without removing characters
by Transient (Hermit) on Jun 17, 2005 at 19:53 UTC
    another way:
    my $foo = "Hello WorldFoo BarPerl MonksSlash Dot"; for ($foo) { s/([a-z])([A-Z])/$1|$2/g; for (split /\|/ ) { print "$_ \n"; } }

      I don't care for this approach, because it fails as soon as $foo comes with embedded pipe characters. Then you have to think about escaping existing pipe characters, and so on, until you've reinvented CSV.

      -Colin.

      WHITEPAGES.COM | INC

        Nevertheless, given a set of known data, or a one-time event, this will work just as well. Not that I'm particularly a fan of it either, but it is another way to do it.
        Whilst a good point, it would have been helpful to add that a character other than a pipe character could be substituted, thereby avoiding the problem whereby $foo comes with embedded pipe characters. In fact you could use a complex string which you feel is almost guaranteed not to occur!

        e.g.

        my $sepstring = '7c6xb1%$#!@#$!@'; my $foo = "Hello WorldFoo BarPerl MonksSlash Dot"; for ($foo) { s/([a-z])([A-Z])/$1$sepstring$2/g; for (split /\Q$sepstring\E/ ) { print "$_ \n"; } }
Re: Split without removing characters
by Eimi Metamorphoumai (Deacon) on Jun 17, 2005 at 20:22 UTC
    Another approach
    my $foo = "Hello WorldFoo BarPerl MonksSlash Dot"; for ($foo) { for (/[A-Z][a-z]+/g) { print "$_ \n"; } }
      __OUTPUT__ Hello World Foo Bar Perl Monks Slash Dot
      although
      for ($foo) { for (/[A-Z][a-z]+(?:\s[A-Z][a-z]+)+/g) { print "$_ \n"; } }
      would work.
Re: Split without removing characters
by Anonymous Monk on Jun 17, 2005 at 20:28 UTC
    Thank you guys. Appreciate it.