Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Strings and numbers: losing memory and mind. (SV sizes)

by tye (Sage)
on Sep 28, 2007 at 05:33 UTC ( [id://641470] : note . print w/replies, xml ) Need Help??


in reply to Strings and numbers: losing memory and mind.

The point wasn't clear to me immediately. Then I got it and it reminded me of a point ysth mentioned recently: Using an integer as a string usually makes for a considerably larger (in memory) scalar than using a string as a number. So your fix will cause a much bigger problem if you actually used your numbers as both numbers and strings (except it doesn't appear to for integers). Just FYI.

A string, "123" "1.2", likely gets stored in 4 bytes (plus the SV overhead) and then caching the numeric value adds another 8 16 bytes (or so). A number, 123 1.2, gets stored in 8 bytes (or so) and then caching the string value causes a string buffer to be allocated that is large enough to hold any stringified number and that (rather larger) buffer remains attached to the scalar (holding the stringified version of the numeric value).

Note that these considerations usually don't matter. I'm even a little curious how much heap fragmentation played a role in your situation (since each SV has to be reallocated when the numeric value is cached, I assume).

I have never had to resort to such tricks and, when I've needed to reduce memory footprint I've resorted to techniques that (I believe) actually have a more significant impact. Your tactic strikes me as something that is usually a waste to worry about before actually determining that it matters in the paritcular situation. A form of premature micro-optimization.

But I'm also glad to learn of these things, just in case I do run into cases where they point to the easiest way to get enough reduction in memory usage for some practical gain.

- tye        

  • Comment on Re: Strings and numbers: losing memory and mind. (SV sizes)

Replies are listed 'Best First'.
Re^2: Strings and numbers: losing memory and mind. (SV sizes)
by almut (Canon) on Sep 28, 2007 at 13:28 UTC
    Using an integer as a string usually makes for a considerably larger (in memory) scalar than using a string as a number.

    How does this fit in with the size that Devel::Size reports? If you can trust it, a stringified number, and a numified string result in the same size (32 byte for the value 123, on a 32-bit Perl).

    use Devel::Size qw(size); use Devel::Peek; sub info { print Dump($_[0]); print "size = ",size($_[0])," ($_[1])\n\n"; } $num = 123; # or int("123") info($num, "integer"); $num .= ""; info($num, "integer stringified"); $str = "123"; info($str, "string"); $str += 0; info($str, "string numified"); $str += 45678900; info($str, "... with bigger integer"); $str .= ""; info($str, "... re-stringified");

    outputs something like

    SV = IV(0x816983c) at 0x8192124 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 123 size = 16 (integer) SV = PVIV(0x8150b10) at 0x8192124 REFCNT = 1 FLAGS = (POK,pPOK) IV = 123 PV = 0x81c9f18 "123"\0 CUR = 3 LEN = 4 size = 32 (integer stringified) SV = PV(0x814fb90) at 0x81ca934 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x81950f0 "123"\0 CUR = 3 LEN = 4 size = 28 (string) SV = PVIV(0x8150b20) at 0x81ca934 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 123 PV = 0x81950f0 "123"\0 CUR = 3 LEN = 4 size = 32 (string numified) SV = PVIV(0x8150b20) at 0x81ca934 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 45679023 PV = 0x81950f0 "123"\0 CUR = 3 LEN = 4 size = 32 (... with bigger integer) SV = PVIV(0x8150b20) at 0x81ca934 REFCNT = 1 FLAGS = (POK,pPOK) IV = 45679023 PV = 0x81950f0 "45679023"\0 CUR = 8 LEN = 12 size = 40 (... re-stringified)

    Devel::Peek shows a comparable resulting structure for "number stringified" and "string numified" (with respect to IV and PV usage). Also, one can observe that the overall size gets larger if you make the number bigger, and then re-stringify the variable...

    Anyhow, does your comment mean that Devel::Size is not reporting the size related to the entire PV buffer allocated for the cached stringified form, but rather its currently used part only (up to and including the \0)? — which would make it a less useful tool for determining real memory usage. Actually, the size that Devel::Size reports seems to be related to the LEN in the Devel::Peek dump (which itself you can observe to increment in steps of 4, if you play around a bit). Just wondering...

      Anyhow, does your comment mean that Devel::Siz­e is not reporting the size related to the entire PV buffer allocated

      Wow, you are actually considering believing some second-hand hear-say over numbers output by a module in black-and-white? (:

      I just restated what ysth said. It made sense to me and I trust ysth but I didn't do any experiments to validate the claims. Perhaps ysth will provide some details. It certainly could be a "problem" only on a different version of Perl than what you tested on, for example. Or it may have been a misinterpretation of some data on ysth's part; after all, it was a rather casual comment and so I may have erred to elevate it to the level of a node or just misinterpretted it. We'll see what others contribute.

      Thanks for testing it.

      Looking at some source code, using an NV instead of an IV likely makes the difference (which testing shows is true on my version of Perl, allocating 36 bytes for the string "1.1", roughly doubling the size of ($x=1.1).='' over ($y='1.1')+=0; not a huge difference in most situations). The code appears to pre-construct the string then allocate/copy just the required size for an IV or UV but to allocate the buffer in the SV first when converting an NV. And based on ysth's comment, I wouldn't be surprised if the NV case has changed in some development version of Perl.

      - tye        

        Wow, you are actually considering believing some second-hand hear-say over numbers output by a module in black-and-white? (:

        Greatly do we respect our monks and greatly do we suspect our tools.

Re^2: Strings and numbers: losing memory and mind. (SV sizes)
by kyle (Abbot) on Sep 28, 2007 at 18:18 UTC

    Your tactic strikes me as something that is usually a waste to worry about before actually determining that it matters in the particular situation.

    I agree! Perhaps I should have prefaced my meditation by saying that this would not have been a problem I'd have needed to solve if the data set were not so large. As it was, reading 50 million strings took a little less than 5G of memory, and then it started eating up more during processing. Using 50 million ints instead took only about 2G of memory (and, of course, didn't grow). It's the difference between "just fits" and "won't work."

    Tracking this down was such a puzzle for me because I've really never had to worry about it before. Strings and numbers frolic freely together. Perl worries about the details, and I don't.