Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

O most wise monks, I know I could write a loop and look at each character to do this, but it just doesn't seem terribly elegant - I know there's a way to do this with a regex, but the exact method escapes me.

Given a string, say "011122xx3x344444", how would I split it into an array with each element of the array consisting of like adjacent characters? With the example above, I'd get an array like this: qw(0 111 22 xx 3 x 3 44444).

Thanks in advance!

Replies are listed 'Best First'.
Re: Regex question
by Anonymous Monk on May 20, 2009 at 14:15 UTC
    this is somewhere in faq
    C:\>perl -le"print $1 while q!011122xx3x344444! =~ /((.)\2*)/g" 0 111 22 xx 3 x 3 44444
Re: Regex question
by ww (Archbishop) on May 20, 2009 at 17:49 UTC
    "...doesn't seem terribly elegant..."

    Certainly AnnonyMonk's one-liner is more elegant, for many values of "elegant," but there's a certain elegance (IMO) to a step-by-step listing for future readers who may find the regex above a bit intimidating or even confusing.

    So, inelegant though it may be (especially, "gentle, future-readers," the global variable declarations which you would do well to avoid in any substantial project), TIMTOWTDI:

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; # group adjacent-identical-chars to array elements; OP wanted "elegant +" rather than char-by-char like this my $str = '011122xx3x344444'; my @new_arr; my $last_seen = ''; my $found_char = ''; my $arr_element; while ($str =~ /([A-Z,0-9])/ig) { $found_char = $1; if ( $last_seen =~ /$found_char/ ) { $arr_element .= $found_char; } else { if ($found_char ) { push @new_arr, $arr_element; } $arr_element = $found_char; $last_seen = $found_char; } next; } push @new_arr, $arr_element; print Dumper @new_arr;<c> <p>which outputs:</p> <c>$VAR1 = '0'; $VAR2 = '111'; $VAR3 = '22'; $VAR4 = 'xx'; $VAR5 = '3'; $VAR6 = 'x'; $VAR7 = '3'; $VAR8 = '44444';

      So much of your code can be simplified by changing
      while ($str =~ /([A-Z,0-9])/ig) {
      to
      while ($str =~ /(([A-Z,0-9])\2*)/ig) {

      The posted code is beyond not elegant. It's in the realm of needless complexity.

      Update: Forgot to mention:

      Besides that, I don't know why you imposed a limit on which characters can be processed. The OP didn't mention anything about ignoring characters which are neither unaccented latin alpha, romanlatin? digits nor a comma. (Did you even mean to put that comma there?)

        1. Stuck to the char set provided by OP. (BTW, I think that while the "I," "V," "X," "L," "C," "and "M" fill the bill as "Roman numerals," "0" .. "9" are Arabic.)

        2. No. Dumb mistake!

      I understand and agree with the sentiment but the code, to me at least, is obviously on the wrong side of the line. Take these two pseudo-codes.

      1: # Buy the least expensive gallon of 2% milk at the store. 2: # Drive to the store. # Go to the dairy aisle. # Check the milk types. # Check the milk prices. # Compare the prices to the types. # Select the type 2% where the prices is <= other prices. # Head to the checkout. # Pay for the milk. # Drive home.

      The more explicit version can become much harder to follow than the higher level version.

        Well, I can't entirely agree. The code in my prior post is intended to be instructional, in the vein of "crawl before you walk; walk before you run." Given that, perhaps it should have been ("may be," if I get around to it) extensively commented.

        1. Your pseudo-code 2 doesn't match what amounts to a spec in Pseudo-code 1.
          • Pseudo-code 1 doesn't say anything about GOing to the store, so knock out lines 5 and 13
          • Pseudo-code 1 says, unambiguously, that what you want is
            1. 2% milk (no need to survey the types).
            2. the least expensive item satisfying the prior criterion. *
        2. Hence, there is no need for line 9, "compare prices to the types"
        3. Line 10 contradicts the spec; see my line 3 below.
        4. And to carry on with the absurdities, your lines 2, 11, and 12 are implicit in "buy"

        * We can regard "gallon" as ambiguous, as that could mean "any combination of containers of 2% milk which aggregate to a gallon," but it might also mean that you have a specific reason for wanting the milk in a "one gallon container."

        That leaves:

        2: =~s/Select the type 2% where the prices is <= other prices./Buy the le +ast expensive 2% milk/;

        Perhaps next time you go to the store to buy apples you should watch out for the oranges. Of course, I need to watch out too... for the absurdities to which my logic leads me.  ;-)