Re: performance of length() in utf-8

It seems to me that bigger is the string, longer is the time to get its size, (exponentialy?).

Linearly. Perl_utf8_length is basically:

while (start_ptr < end_ptr) {
    start_ptr += UTF8SKIP(start_ptr);
    len++;
}
[download]

and UTF8SKIP just returns the number of bytes in a utf-8 encoded codepoint given the starting byte. So, basically, PL_utf8_length is O(N), where N is the number of codepoins in (SvUTF8) string.

Your LenTestA is exponential because it basically does

PL_utf8_length($chunk)                    # first iteration
PL_utf8_length($chunk . $chunk)           # second iteration
PL_utf8_length($chunk . $chunk . $chunk)  # third iteration
...and so on...
[download]

while LenTestB is

PL_utf8_length($chunk)  # 1st
PL_utf8_length($chunk)  # 2nd
PL_utf8_length($chunk)  # 3rd
...
[download]

Comment on Re: performance of length() in utf-8 Select or Download Code

Replies are listed 'Best First'.
Re^2: performance of length() in utf-8 by kennethk (Abbot) on Mar 03, 2016 at 20:38 UTC
Your LenTestA is exponential That would be quadratic growth. You have a series of operations that grows linearly, and your number of invocations grows linearly with line length. #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.	[reply]
Re^3: performance of length() in utf-8 by Anonymous Monk on Mar 03, 2016 at 20:46 UTC
Maybe, maybe, I don't speak english very well. Basically, it's O(N**2).	[reply]
Re^4: performance of length() in utf-8 by kennethk (Abbot) on Mar 03, 2016 at 21:49 UTC
Yes, quadratic is a synonym for polynomial of second order. Quadratic_growth Exponential_growth -- also known as explosive or geometric growth #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.	[reply]
Re^4: performance of length() in utf-8 by SuicideJunkie (Vicar) on Mar 03, 2016 at 21:51 UTC
For reference, exponential would be similar to `O(2**N)`, which is terrible like the traveling spambot problem.	[reply] [d/l]