in reply to Efficienty truncating a long string

A quick test shows that substr is pretty intelligent about the way it operates.

# OS reports memory use 3336k my $s = ' ' x 1_000_000; # OS reports memory use 4320k $s = substr $s, 0, 999_999; # OS continues to report 4320k

If substr was acting as a copy operator the memory would have to grow again to accomodate the copy. That is doesn't, even in this non-lvalue usage tends to indicate that the code has the smarts to recognise when the destination of a substr assignment is the same and the source and it performs a simple adjustment to the length of the SV in-situ, which is about a fast as is possible to be.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!

Replies are listed 'Best First'.
Re: Re: Efficienty truncating a long string
by Anonymous Monk on Dec 19, 2003 at 13:53 UTC
    not sure, but i think this happen because OS copy-on-write memory pages, not by substr

      That's an intersting thought, but doesn't appear to be the case.

      # 3332k; $s = ' ' x 1_000_000; # 4316k; $s = substr $s, 0, 999_000; # 4316k; $s .= '?' x 2000; # 4316k;

      Had that been the case, I would have expected to see memory growth when I appended to the copied scalar, but this doesn't happen. (On win32 anyway.)

      Conversly, if it were a copy-on-write phenonema, then assigning the truncated substring to another scaler would likewise defer the copy until the new scalar was modified, which doesn't happen.

      # 3336k; $s = ' ' x 1_000_000; # 4320; $t = substr $s, 0, 999_999; # 5308k;

      Tracking the sources, I can't see any explicit step taken in pp_substr or sv_setpvn to avoid copying when the source and target are the same. However, the address of the target is known to the code at this point and a call is made to sv_GROW to ensure that the target (in this case the same as the source), is large enough, and it is here where any extra memory allocation would be performed. In this case, the target SV is the same as the source, and as the "growth" required is actually shrinkage, no allocation is necessary.

      The actual copy of the data is (eventually) performed using the C-library call memmove().

      This is the memcpy() look-alike that has extra nounce to deal with overlapping copies. In the case of a simple truncation, the logic -- which I don't have access to, but I can guess at -- probably results in simply copying a single null byte to the insertion point.

      What actually happens is also dependant upon the C runtime used, but this is an obvious optimisation that probably exists in all versions of memmove()


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      Hooray!