comment on

The issue is chronicled in Mastering Regular Expressions (The Owls Book) by Jeoffrey Friedl. I still only have the first edition of MRE, but I dug it out to refresh my memory on the issue.

According to Friedl, /i causes a temporary copy to be made of the entire target string. This is in addition to any copies made by the dirty variables like $&. The /i copy is done prior to any match occurring, whereas $& only makes a copy after a successful match.

After the copy is made, the RE engine takes a second pass over the string, converting upper case letters to lower case, even if the original was already all lowercase.

As the RE itself is compiled, it too has all of its literals converted to lowercase.

So you have extra copies being made, target string lowercase conversion, a lowercasing of the RE itself during the compilation phase of the RE, and that all adds up to, as Friedl puts it, "one of the most gratuitous inefficiencies in all of Perl." (He is a RE guy though.)

In Friedl's testing, he found worst case scenarios that took over four orders of magnitude with regards to using the /i modifier on strings of 1192395 bytes (1.2MB). He calculated that due to the "needless copying", Perl shuffled more than 647,585MB around inside his CPU.

That was a worst case test. In what he considered more of a real-world test, he found that m/\b[wW][hH][iI][lL][eE]\b/g resulted in a testrun of fifty times faster than m/\bwhile\b/gi on a huge string.

His conclusion was, "don't use /i unless you really have to."

I've done a quick skim through the various perldelta documents and haven't seen any mention since 5.005 of an improved and more efficient /i modifier. That doesn't mean I couldn't have missed it; there's a lot to skim. Someone may correct me, but it looks like at least for now that part of the RE engine hasn't changed.

Dave

In reply to Re: Did the inefficiency of /i get fixed? by davido
in thread Did the inefficiency of /i get fixed? by Cody Pendant

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.