Re: Re: regexp hang?

Replies are listed 'Best First'.
(tye)Re: regexp hang? by tye (Sage) on Aug 22, 2002 at 05:54 UTC
No, that doesn't match "200 words"; you'd want \W+ not \W* for that. If you had at least 200 words then it will match rather quickly. Otherwise it can take a very long time backtracking and having \W* match zero-length bits in the middles of words trying to work back until it can match 200 partial words. - tye (but my friends call me "Tye")	[reply]
Re^3: regexp hang? by Aristotle (Chancellor) on Aug 22, 2002 at 05:42 UTC
~~This is odd. I have no immediate idea; you avoided what the camel book demonstrates as the `aa` pitfall, I think, since you ask for delimiting `\W` characters. I'm not sure, but I think~~ forcing the `\W` to match using +, and not accepting zero matches using , would fix things. I also propose you anchor the pattern. You can also catch a free speed bonus by not capturing the inner brackets (note that this puts the rest into `$2` rather than `$3`). `s/^((?:\w+\W+){200})(.)/$1/si` Update: right, tye++ confirms my intuition. Makeshifts last the longest.	[reply] [d/l]
Re(3): regexp hang? by Arien (Pilgrim) on Aug 22, 2002 at 06:02 UTC
tye has already told you why you could see slowness caused by backtracking. Also, I would write "seperate `$string` in two parts: the first 200 words (`$intro`) and the rest (`$rest`)" like this: `($rest = $string) =~ s/\A((?:\w+\W+){200})//s and $intro = $1;` (Assuming your string starts with a word character and you don't mind the extra non-word character(s) after the 200th word.) — Arien Edit: If you don't care about changing the value of `$string` you could obviously leave out the copy to `$rest`.	[reply] [d/l] [select]