Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Perl alternation regex looks ok to me?

by misterperl (Pilgrim)
on Sep 18, 2018 at 13:51 UTC ( [id://1222581]=perlquestion: print w/replies, xml ) Need Help??

misterperl has asked for the wisdom of the Perl Monks concerning the following question:

I've been a Perl programmer since about 1997 and I know a lot about regexes. Not a guru perhaps, but at least a master :) But this I don't get.

I had a string like XT3USI , and I want to test to see if it began with (any char), followed by T, followed by 2 or S. So I used /^.T2|S/

which looked perfect. Reading from L to R, "begins with a char, then a T, then a (2 or an S)". But instead it acted like I had used: /^.T(2|.*S)/

2|S, in my mind, should have alternated chars 2 or S, not 2 or (0-many other chars) followed by S.

I replaced it with /^.(T2|TS)/ which bugs me. But works..

Advice is, as always, appreciated. I use it all the time, but maybe all these years I misunderstood alternation.

Replies are listed 'Best First'.
Re: Perl alternation regex looks ok to me?
by toolic (Bishop) on Sep 18, 2018 at 13:59 UTC

    Tip #9 from the Basic debugging checklist: YAPE::Regex::Explain

    The regular expression: (?-imsx:^.T2|S) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- . any character except \n ---------------------------------------------------------------------- T2 'T2' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- S 'S' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
    You should use parentheses to group the alternation:
    /^.T(2|S)/
    Or, if there is no need to capture:
    /^.T(?:2|S)/
      Or, if each alternative is one character
      /^.T[2S])/
      ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Perl alternation regex looks ok to me?
by hippo (Bishop) on Sep 18, 2018 at 14:19 UTC
    2|S, in my mind, should have alternated chars 2 or S, not 2 or (0-many other chars) followed by S.

    In isolation, yes. But you have used it in combination with other elements without limiting the alternation.

    Far easier if you just want alternate characters to use a character class, eg: /^.T[2S]/

    HTH.

      thanks I guess a character class might work out.
Re: Perl alternation regex looks ok to me?
by Laurent_R (Canon) on Sep 18, 2018 at 16:04 UTC
    You've been given good solutions already, but, just to explain further, note that alternatives have a very low precedence in regexes, so that, for example, /blue|green/ is understood as an alternative between the two colors (either blue or green), and not as something like /blu(e|g)reen/ (i.e. either bluereen or blugreen).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1222581]
Approved by toolic
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2024-04-25 23:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found