According to Friedl, /i causes a temporary copy to be made of the entire target string. This is in addition to any copies made by the dirty variables like $&. The /i copy is done prior to any match occurring, whereas $& only makes a copy after a successful match.
After the copy is made, the RE engine takes a second pass over the string, converting upper case letters to lower case, even if the original was already all lowercase.
As the RE itself is compiled, it too has all of its literals converted to lowercase.
So you have extra copies being made, target string lowercase conversion, a lowercasing of the RE itself during the compilation phase of the RE, and that all adds up to, as Friedl puts it, "one of the most gratuitous inefficiencies in all of Perl." (He is a RE guy though.)
In Friedl's testing, he found worst case scenarios that took over four orders of magnitude with regards to using the /i modifier on strings of 1192395 bytes (1.2MB). He calculated that due to the "needless copying", Perl shuffled more than 647,585MB around inside his CPU.
That was a worst case test. In what he considered more of a real-world test, he found that m/\b[wW][hH][iI][lL][eE]\b/g resulted in a testrun of fifty times faster than m/\bwhile\b/gi on a huge string.
His conclusion was, "don't use /i unless you really have to."
I've done a quick skim through the various perldelta documents and haven't seen any mention since 5.005 of an improved and more efficient /i modifier. That doesn't mean I couldn't have missed it; there's a lot to skim. Someone may correct me, but it looks like at least for now that part of the RE engine hasn't changed.
Dave
In reply to Re: Did the inefficiency of /i get fixed?
by davido
in thread Did the inefficiency of /i get fixed?
by Cody Pendant
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |