Regex question

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Regex question by Anonymous Monk on May 20, 2009 at 14:15 UTC
this is somewhere in faq `C:\>perl -le"print $1 while q!011122xx3x344444! =~ /((.)\2*)/g" 0 111 22 xx 3 x 3 44444` [download]	[reply] [d/l]
Re: Regex question by ww (Archbishop) on May 20, 2009 at 17:49 UTC
"...doesn't seem terribly elegant..." Certainly AnnonyMonk's one-liner is more elegant, for many values of "elegant," but there's a certain elegance (IMO) to a step-by-step listing for future readers who may find the regex above a bit intimidating or even confusing. So, inelegant though it may be (especially, "gentle, future-readers," the global variable declarations which you would do well to avoid in any substantial project), TIMTOWTDI: #!/usr/bin/perl use strict; use warnings; use Data::Dumper; # group adjacent-identical-chars to array elements; OP wanted "elegant +" rather than char-by-char like this my $str = '011122xx3x344444'; my @new_arr; my $last_seen = ''; my $found_char = ''; my $arr_element; while ($str =~ /([A-Z,0-9])/ig) { $found_char = $1; if ( $last_seen =~ /$found_char/ ) { $arr_element .= $found_char; } else { if ($found_char ) { push @new_arr, $arr_element; } $arr_element = $found_char; $last_seen = $found_char; } next; } push @new_arr, $arr_element; print Dumper @new_arr;<c> <p>which outputs:</p> <c>$VAR1 = '0'; $VAR2 = '111'; $VAR3 = '22'; $VAR4 = 'xx'; $VAR5 = '3'; $VAR6 = 'x'; $VAR7 = '3'; $VAR8 = '44444'; [download]	[reply] [d/l]
Re^2: Regex question by ikegami (Patriarch) on May 20, 2009 at 19:43 UTC
So much of your code can be simplified by changing `while ($str =~ /([A-Z,0-9])/ig) {` to `while ($str =~ /(([A-Z,0-9])\2)/ig) {` The posted code is beyond not elegant. It's in the realm of needless complexity. Update*: Forgot to mention: Besides that, I don't know why you imposed a limit on which characters can be processed. The OP didn't mention anything about ignoring characters which are neither unaccented latin alpha, ~~roman~~latin? digits nor a comma. (Did you even mean to put that comma there?)	[reply] [d/l] [select]
Re^3: Regex question by ww (Archbishop) on May 20, 2009 at 20:29 UTC
1. Stuck to the char set provided by OP. (BTW, I think that while the "I," "V," "X," "L," "C," "and "M" fill the bill as "Roman numerals," "0" .. "9" are Arabic.) 2. No. Dumb mistake!	[reply]
Re^2: Regex question by Your Mother (Archbishop) on May 20, 2009 at 19:01 UTC
I understand and agree with the sentiment but the code, to me at least, is obviously on the wrong side of the line. Take these two pseudo-codes. `1: # Buy the least expensive gallon of 2% milk at the store. 2: # Drive to the store. # Go to the dairy aisle. # Check the milk types. # Check the milk prices. # Compare the prices to the types. # Select the type 2% where the prices is <= other prices. # Head to the checkout. # Pay for the milk. # Drive home.` [download] The more explicit version can become much harder to follow than the higher level version.	[reply] [d/l]
Re^3: Regex question by ww (Archbishop) on May 20, 2009 at 20:17 UTC
Well, I can't entirely agree. The code in my prior post is intended to be instructional, in the vein of "crawl before you walk; walk before you run." Given that, perhaps it should have been ("may be," if I get around to it) extensively commented. Your pseudo-code 2 doesn't match what amounts to a spec in Pseudo-code 1. Pseudo-code 1 doesn't say anything about GOing to the store, so knock out lines 5 and 13 Pseudo-code 1 says, unambiguously, that what you want is 2% milk (no need to survey the types). the least expensive item satisfying the prior criterion. ^* Hence, there is no need for line 9, "compare prices to the types" Line 10 contradicts the spec; see my line 3 below. And to carry on with the absurdities, your lines 2, 11, and 12 are implicit in "buy" ^* We can regard "gallon" as ambiguous, as that could mean "any combination of containers of 2% milk which aggregate to a gallon," but it might also mean that you have a specific reason for wanting the milk in a "one gallon container." That leaves: `2: =~s/Select the type 2% where the prices is <= other prices./Buy the le +ast expensive 2% milk/;` [download] Perhaps next time you go to the store to buy apples you should watch out for the oranges. Of course, I need to watch out too... for the absurdities to which my logic leads me. ;-)	[reply] [d/l]
Re^4: Regex question by Your Mother (Archbishop) on May 21, 2009 at 01:09 UTC