Re: really large regex misbehaving

Because your regex decides to match "/" as one valid match, and then "*x*/" as the second match.

@strings = (
  'x',
  '/*x*/',
  '"x"',
  '"\"',
);

for (@strings) {
  while (/\G$REx/g) {
    print "$_ => '$1'";
  }
  print "";
}
__END__
x => 'x'

/*x*/ => '/'
/*x*/ => '*x*/'

"x" => '"x"'
[download]

See? Oh, and this is a helpful application of YAPE::Regex::Explain. Here's the output from explain. It'll explain what your regex is doing.

Warning: it is very long.

Edited 2001-05-22 by Ovid

(?x-ims:               # group, but do not capture (disregarding
                       # whitespace and comments) (case-sensitive)
                       # (with ^ and $ matching normally) (with . not
                       # matching \n):

  (                      # group and capture to \1:

    (                      # group and capture to \2:

      (                      # group and capture to \3:

        \?+                    # '?' (1 or more times (matching the
                               # most amount possible))

       |                      # OR

        (                      # group and capture to \4:

          \?                     # '?'

          \/                     # '/'

         |                      # OR

          \/                     # '/'

        )                      # end of \4

        (                      # group and capture to \5 (0 or more
                               # times (matching the most amount
                               # possible)):

          \?                     # '?'

          \/                     # '/'

        )*                     # end of \5 (NOTE: because you're
                               # using a quantifier on this capture,
                               # only the LAST repetition of the
                               # captured pattern will be stored in
                               # \5)

        \?*                    # '?' (0 or more times (matching the
                               # most amount possible))

      )                      # end of \3

     |                      # OR

      (                      # group and capture to \6:

        [^\'\"\/\s\?]          # any character except: '\'', '\"',
                               # '\/', whitespace (\n, \r, \t, \f,
                               # and " "), '\?'

       |                      # OR

        (                      # group and capture to \7:

          (                      # group and capture to \8:

            \?                     # '?'

           |                      # OR

            \/                     # '/'

          )                      # end of \8

          \?+                    # '?' (1 or more times (matching the
                                 # most amount possible))

          [^\s\?\"]              # any character except: whitespace
                                 # (\n, \r, \t, \f, and " "), '\?',
                                 # '\"'

        )                      # end of \7

       |                      # OR

        (                      # group and capture to \9:

          (                      # group and capture to \10:

            \?                     # '?'

            \/                     # '/'

           |                      # OR

            \/                     # '/'

          )                      # end of \10

          (                      # group and capture to \11 (0 or
                                 # more times (matching the most
                                 # amount possible)):

            \?                     # '?'

            \/                     # '/'

          )*                     # end of \11 (NOTE: because you're
                                 # using a quantifier on this
                                 # capture, only the LAST repetition
                                 # of the captured pattern will be
                                 # stored in \11)

          (                      # group and capture to \12:

            [^\'\"\/\s             # any character except: '\'',
            \?\*]                  # '\"', '\/', whitespace (\n, \r,
                                   # \t, \f, and " "), '\?', '\*'

           |                      # OR

            \?                     # '?'

            (                      # group and capture to \13:

              \?+                    # '?' (1 or more times (matching
                                     # the most amount possible))

              [^\?\"\s               # any character except: '\?',
              ]                      # '\"', whitespace (\n, \r, \t,
                                     # \f, and " ")

             |                      # OR

              [^\'\"\/               # any character except: '\'',
              \s\?]                  # '\"', '\/', whitespace (\n,
                                     # \r, \t, \f, and " "), '\?'

            )                      # end of \13

          )                      # end of \12

        )                      # end of \9

      )                      # end of \6

      (                      # group and capture to \14 (0 or more
                             # times (matching the most amount
                             # possible)):

        [^\'\"\/\s\?]          # any character except: '\'', '\"',
                               # '\/', whitespace (\n, \r, \t, \f,
                               # and " "), '\?'

       |                      # OR

        \?                     # '?'

        \?+                    # '?' (1 or more times (matching the
                               # most amount possible))

        [^\?\"\s]              # any character except: '\?', '\"',
                               # whitespace (\n, \r, \t, \f, and " ")

       |                      # OR

        (                      # group and capture to \15:

          \/                     # '/'

         |                      # OR

          \?                     # '?'

          \/                     # '/'

        )                      # end of \15

        (                      # group and capture to \16 (0 or more
                               # times (matching the most amount
                               # possible)):

          \?                     # '?'

          \/                     # '/'

        )*                     # end of \16 (NOTE: because you're
                               # using a quantifier on this capture,
                               # only the LAST repetition of the
                               # captured pattern will be stored in
                               # \16)

        (                      # group and capture to \17:

          [^\'\"\/\s\?           # any character except: '\'', '\"',
          \*]                    # '\/', whitespace (\n, \r, \t, \f,
                                 # and " "), '\?', '\*'

         |                      # OR

          \?                     # '?'

          (                      # group and capture to \18:

            \?+                    # '?' (1 or more times (matching
                                   # the most amount possible))

            [^\?\"\s]              # any character except: '\?',
                                   # '\"', whitespace (\n, \r, \t,
                                   # \f, and " ")

           |                      # OR

            [^\'\"\/\s             # any character except: '\'',
            \?]                    # '\"', '\/', whitespace (\n, \r,
                                   # \t, \f, and " "), '\?'

          )                      # end of \18

        )                      # end of \17

      )*                     # end of \14 (NOTE: because you're using
                             # a quantifier on this capture, only the
                             # LAST repetition of the captured
                             # pattern will be stored in \14)

      (                      # group and capture to \19:

        (                      # group and capture to \20 (optional
                               # (matching the most amount
                               # possible)):

          \?+                    # '?' (1 or more times (matching the
                                 # most amount possible))

         |                      # OR

          (                      # group and capture to \21:

            \/                     # '/'

           |                      # OR

            \?                     # '?'

            \/                     # '/'

          )                      # end of \21

          (                      # group and capture to \22 (0 or
                                 # more times (matching the most
                                 # amount possible)):

            \?                     # '?'

            \/                     # '/'

          )*                     # end of \22 (NOTE: because you're
                                 # using a quantifier on this
                                 # capture, only the LAST repetition
                                 # of the captured pattern will be
                                 # stored in \22)

          \?*                    # '?' (0 or more times (matching the
                                 # most amount possible))

        )?                     # end of \20 (NOTE: because you're
                               # using a quantifier on this capture,
                               # only the LAST repetition of the
                               # captured pattern will be stored in
                               # \20)

      )                      # end of \19

    )                      # end of \2

   |                      # OR

    (                      # group and capture to \23:

      (                      # group and capture to \24:

        \'                     # '''

        (                      # group and capture to \25 (0 or more
                               # times (matching the most amount
                               # possible)):

          [^\'?\\]               # any character except: '\'', '?',
                                 # '\\'

         |                      # OR

          \\                     # '\'

          .                      # any character except \n

         |                      # OR

          \?                     # '?'

          (                      # group and capture to \26:

            \?+                    # '?' (1 or more times (matching
                                   # the most amount possible))

            (                      # group and capture to \27:

              \/                     # '/'

              .                      # any character except \n

             |                      # OR

              [^?\/]                 # any character except: '?',
                                     # '\/'

            )                      # end of \27

          )                      # end of \26

        )*                     # end of \25 (NOTE: because you're
                               # using a quantifier on this capture,
                               # only the LAST repetition of the
                               # captured pattern will be stored in
                               # \25)

        (                      # group and capture to \28:

          \'                     # '''

         |                      # OR

          \?                     # '?'

          \'                     # '''

        )                      # end of \28

      )                      # end of \24

     |                      # OR

      (                      # group and capture to \29:

        \"                     # '"'

        (                      # group and capture to \30 (0 or more
                               # times (matching the most amount
                               # possible)):

          [^\"\\?]               # any character except: '\"', '\\',
                                 # '?'

         |                      # OR

          \\                     # '\'

          .                      # any character except \n

         |                      # OR

          \?                     # '?'

          \?+                    # '?' (1 or more times (matching the
                                 # most amount possible))

          (                      # group and capture to \31:

            \/                     # '/'

            .                      # any character except \n

           |                      # OR

            [^\?\"\/]              # any character except: '\?',
                                   # '\"', '\/'

          )                      # end of \31

        )*                     # end of \30 (NOTE: because you're
                               # using a quantifier on this capture,
                               # only the LAST repetition of the
                               # captured pattern will be stored in
                               # \30)

        (                      # group and capture to \32:

          \"                     # '"'

         |                      # OR

          \?+                    # '?' (1 or more times (matching the
                                 # most amount possible))

          \"                     # '"'

        )                      # end of \32

      )                      # end of \29

    )                      # end of \23

  )                      # end of \1

)                      # end of grouping
[download]

japhy -- Perl and Regex Hacker

Comment on Re: really large regex misbehaving - WTF Select or Download Code