leocharre has asked for the wisdom of the Perl Monks concerning the following question:

Ok ok.. loosen your belts.. rolls your eyes, crack your knuckles or maybe close your eyes and randomly punch the keyboard - here's a regex question!

I have a string as such: 423RY75Y69827EC67592C78657N965R

It has to be turned into 423R_Y75Y_69827E_C67592C_78657N965R

What's the idea? The idea is, first and foremost.. if any digits are sandwiched between two letters, then put an underscore to the sides of this group... However, treat N characters as we would digits.

#!/usr/bin/perl -w use strict; use Smart::Comments; my $string = '423RY75Y69827EC67592C78657N965R345U299M'; my $correct = '423R_Y75Y_69827E_C67592C_78657N965R_345U_299M'; $string=~s/([A-MO-Za-mo-z])([\dN]+)\1/_$1$2$1_/g; ### $string # is now '423R_Y75Y_69827E_C67592C_78657N965R345U299M' # now i have troubles.. # the idea is to match right to left, and if after (the last digit # or N char), place an underscore, unless of course, that char is the # char we started with! $string=~s/([^\dN\3])([\dN]+)([A-MO-Za-mo-z])/$1_$2$3/g; ### $string # is now '423R_Y_75Y__69827E_C_67592C__78657N965R345U_299M' # wow, seems to have done the reverse of what i wanted!

I'm having real trickies here . . it's like reverse memory or something that the regex has to do.. ow..

  • Comment on match if last char and first char are different excluding certain chars
  • Download Code

Replies are listed 'Best First'.
Re: match if last char and first char are different excluding certain chars
by johngg (Canon) on Nov 02, 2006 at 16:26 UTC
    As others have pointed out, your description of the problem is a little difficult to understand. However, this code produces the same results as the examples you give so perhaps it is what you want :)

    use strict; use warnings; my @tests = ( { str => q{423RY75Y69827EC67592C78657N965R}, ans => q{423R_Y75Y_69827E_C67592C_78657N965R} }, { str => q{423RY75Y69827EC67592C78657N965R345U299M}, ans => q{423R_Y75Y_69827E_C67592C_78657N965R_345U_299M} }); my $rxDigit = qr{[N0-9]}; my $rxNonDigit = qr{[A-MO-Z]}; foreach my $rhTest (@tests) { (my $myAns = $rhTest->{str}) =~ s{((?:$rxDigit)+$rxNonDigit)(?!\z)}{$1_}g; print qq{\n$rhTest->{str}\n}, $myAns eq $rhTest->{ans} ? qq{OK\n} : qq{Not OK\n}, qq{$myAns\n}, qq{$rhTest->{ans}\n}; }

    Here is the output.

    423RY75Y69827EC67592C78657N965R OK 423R_Y75Y_69827E_C67592C_78657N965R 423R_Y75Y_69827E_C67592C_78657N965R 423RY75Y69827EC67592C78657N965R345U299M OK 423R_Y75Y_69827E_C67592C_78657N965R_345U_299M 423R_Y75Y_69827E_C67592C_78657N965R_345U_299M

    I hope this is of use.

    Cheers,

    JohnGG

Re: match if last char and first char are different excluding certain chars
by bart (Canon) on Nov 02, 2006 at 16:55 UTC
    Ignoring your requirements and just looking at your example, I derive a different set of rules:
    1. Treat an "N" as if it was a digit
    2. If there's a group of letters (minus "N") followed by a digit (or "N") somewhere down the road, then put a underscore after the first letter.
    In code:
    #! perl -wl my $string = '423RY75Y69827EC67592C78657N965R345U299M'; my $correct = '423R_Y75Y_69827E_C67592C_78657N965R_345U_299M'; $string =~ s/([A-MO-Z])([A-MO-Z]*[\dN]+)/$1_$2/g; print $string eq $correct' ? 'OK' : 'NOT OK';
    This prints:
    OK
    

    Are you sure you've got your requirements right?

Re: match if last char and first char are different excluding certain chars
by davido (Cardinal) on Nov 02, 2006 at 16:20 UTC

    I think you need to clarify your requirement before you can get a smart answer. You said that you want to put an underscore to the sides, but your example is not consistent with that requirement, and that ruins everything.

    One thing I'll mention though. It sounds like non-digit, non-N characters are akin to "balanced text" characters. Perhaps you would get farther with Text::Balanced, using it to pull out the substrings which you can then attach underscores to. Only use the regexp to determine if the first character is a non-digit, and different from the last character which also must be a non-digit.


    Dave

Re: match if last char and first char are different excluding certain chars (Updated)
by BrowserUk (Patriarch) on Nov 02, 2006 at 15:51 UTC

    Update: Ignore this. I didn't look at the example (or comments) in the code.

    Got a few more examples?

    > $in = '423RY75Y69827EC67592C78657N965R';; > ( $out = $in ) =~ s[((\D)[0-9N]+\2)][_$1_]g;; > print $out;; 423R_Y75Y_69827E_C67592C_78657N965R

    Update2: This appears to do it, but it may be dependant upon specifics of the single example?

    > $in = '423RY75Y69827EC67592C78657N965R345U299M';; > ( $out = $in ) =~ s[ ( ( \D ) [0-9N]+ \2 ) | ( [0-9N]+? [A-MO-Z] (?=\d) ) ]{ defined $1 ? "_$1_" : "$3_" }egx; print $out;; 423R_Y75Y_69827E_C67592C_78657N965R_345U_299M

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: match if last char and first char are different excluding certain chars
by jwkrahn (Abbot) on Nov 02, 2006 at 16:29 UTC
    $ perl -le' my $string = q/423RY75Y69827EC67592C78657N965R345U299M/; print $string; $string =~ s/(([A-MO-Z])[\dN]+\2)/_$1_/ig; print $string; substr( $string, 1 + rindex $string, q/_/ ) =~ s/(?<=[^\dN\3])([\dN]+[ +A-MO-Z])/_$1/ig; print $string; ' 423RY75Y69827EC67592C78657N965R345U299M 423R_Y75Y_69827E_C67592C_78657N965R345U299M 423R_Y75Y_69827E_C67592C_78657N965R_345U_299M
Re: match if last char and first char are different excluding certain chars (^.^)
by tye (Sage) on Nov 02, 2006 at 18:12 UTC
    s/([^[:^alpha:]nN])([\dnN]+\1)/_$1$2_/g; s/__/_/g;

    Update: Ah, you had more specs hidden in your code block (and roughly that solution, except without the double-^ trick that I like). You can't use things like \3 in a character class, even if you don't try to use them out-of-order. I'll take a stab later. Update:

    my $string = '423RY75Y69827EC67592C78657N965R345U299M'; my $correct= '423R_Y75Y_69827E_C67592C_78657N965R_345U_299M'; for( $string ) { s/([^[:^alpha:]nN])([\dnN]+\1)/_$1$2_/g; s/(?<=([^\dnN_]))?([\dnN]+)([^\dnN_])/ $2 . $3 . ( $1 eq $3 ? '' : '_' ) /ge; s/_(_|$)/$1/g; } print "Okay\n" if $string eq $correct;

    Add "defined $1 &&" or "no warnings 'uninitialized'" to avoid a warning, if you need to.

    - tye        

      That's pretty wild. It actually works too. When I started perl, this is the kind of code I would look at and think to myself.. 'what th...'. It's hurting me a little bit right now. I have to zone to see it.

      Nice code, man.

Re: match if last char and first char are different excluding certain chars
by Anonymous Monk on Nov 02, 2006 at 15:55 UTC
    I can't make heads or tails of your descriptions and examples. If groups of digits are supposed to be surrounded with underscores (taking in account that N is a digit as well), then why does $correct end in R_345U_299M? I would expect it to end in R_345_U_299_M. Also, the group 69827 doesn't get underscores.

    I can make a substitution that matches your descriptions, but that produces quite a different string of $string than $correct is.

    my $L = '[A-MO-Za-mo-z]'; my $D = '[0-9N]'; $_ = '423RY75Y69827EC67592C78657N965R345U299M'; s/($L?)($D+)\1/_${1}${2}${1}_/g; print $_, "\n"; __END__ _423_R_Y75Y__69827_E_C67592C__78657N965_R_345_U_299_M