in reply to Re: regex problem
in thread regex problem

As an optimization, you can get a faster pattern match by not putting your spaces into character classes. That just slows the engine down.
I smell premature optimization...

You're correct. However, there is not enough information from the OP to determine if that's a good idea.

-QM
--
Quantum Mechanics: The dreams stuff is made of

Replies are listed 'Best First'.
Re^3: regex problem
by diotalevi (Canon) on Feb 27, 2006 at 23:20 UTC

    Someone recently estimated to me in a BOTE calculation that [ ] is 10x slower. That is, this hamstrings the screamingly fast boyer-moore literal string matching part of the regex engine and compiles /xxx[ ]xxx/ down to exact("xxx"), anyof(" "), exact("xxx") where it was originally exact( "xxx xxx" ).

    You've changed the regex from one for a seven character literal to one that contains two three character literals. That changes how the engine matches and it changes how quickly the BM part of the engine can either discard entirely or find suitable candidates. It also causes more overhead because three ops have to be executed instead of just one.

    I'm asserting that [ ] is pretty to look at but has serious consequences to the regexp's performance. I wouldn't ordinarilly choose the pretty version in that case. I rarely ever use /x though so I don't need to escape my spaces.

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      My mistake. I mentally transferred Grandfather's \s+ into the OP, instead of registering the OP's [ ], causing you consternation and me a red face.

      My apologies.

      BTW, what's a "BOTE"?

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

        Back Of The Envelope. In this case it was an off the cuff guess from someone who hacks on perl's regexp engine. That person isn't getting named because he probably didn't expect to get quoted. It's clearly more overhead and it definately hamstrings an important optimization. Exactly how much slower it makes the engine on various data for various patterns... eh.

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊