in reply to performance of length() in utf-8

It seems to me that bigger is the string, longer is the time to get its size, (exponentialy?).
Linearly. Perl_utf8_length is basically:
while (start_ptr < end_ptr) { start_ptr += UTF8SKIP(start_ptr); len++; }
and UTF8SKIP just returns the number of bytes in a utf-8 encoded codepoint given the starting byte. So, basically, PL_utf8_length is O(N), where N is the number of codepoins in (SvUTF8) string.

Your LenTestA is exponential because it basically does

PL_utf8_length($chunk) # first iteration PL_utf8_length($chunk . $chunk) # second iteration PL_utf8_length($chunk . $chunk . $chunk) # third iteration ...and so on...
while LenTestB is
PL_utf8_length($chunk) # 1st PL_utf8_length($chunk) # 2nd PL_utf8_length($chunk) # 3rd ...

Replies are listed 'Best First'.
Re^2: performance of length() in utf-8
by kennethk (Abbot) on Mar 03, 2016 at 20:38 UTC
    Your LenTestA is exponential
    That would be quadratic growth. You have a series of operations that grows linearly, and your number of invocations grows linearly with line length.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      Maybe, maybe, I don't speak english very well. Basically, it's O(N**2).
        Yes, quadratic is a synonym for polynomial of second order.

        Quadratic_growth

        Exponential_growth -- also known as explosive or geometric growth


        #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

        For reference, exponential would be similar to O(2**N), which is terrible like the traveling spambot problem.