Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

push to array without copying

by chris212 (Scribe)
on Nov 14, 2016 at 23:09 UTC ( #1175917=perlquestion: print w/replies, xml ) Need Help??

chris212 has asked for the wisdom of the Perl Monks concerning the following question:

my %hash = ('a'=>'test'); my @arr = ($hash{'a'}); print \$hash{'a'}."\n"; print \$arr[0]."\n";
How can I do something like that, except re-use the same data rather than create a copy (same reference, not just same value)? I assume that would be faster? I know I can put the reference in the array, but the values need to be in the array so they can be passed to printf. The size of the array and the keys in the hash will vary. This code will be iterated millions of times, so I want it as fast as possible. The hash won't be used once the array is created, so I'm not concerned about the hash values being modified when the array values are.

Replies are listed 'Best First'.
Re: push to array without copying
by BrowserUk (Patriarch) on Nov 14, 2016 at 23:34 UTC

    The down-side of using references is that dereferencing stuff in multiple places can a) get messy; b) actually end up costing more time than simply duplicating the data and avoiding the dereferencing, if the data items are small.

    The hash won't be used once the array is created, so I'm not concerned about the hash values being modified when the array values are.

    That observation if key here. If you don't need the values in the hash after you've copied them to the array, don't copy them, move them.

    That can be achieved by deleteing the key/value pair from the hash and assigning the return from delete -- which is the value of the key being deleted -- to the array.

    Like this:

    my %hash = ('a'=>'test'); my @arr = delete $hash{'a'}; print $arr[0]."\n";

    This results in the key 'a' being removed from the hash, and its former value 'test' being transfered directly to $arr[0] without any copying.

    I assume that would be faster?

    It will be; provided: a) the size of the values is sufficient to make a noticeable difference -- a few hundred bytes would do it -- and b) it doesn't force you to do too much extra work other places as a result.

    delete also works with hash slices, which makes this very convenient and efficient for doing multiple transfers simultaneously:

    %h = qw[ the quick black fiend jumps over the lazy god child ];; pp \%h;; { black => "fiend", god => "child", jumps => "over", the => "lazy" } print delete @h{ qw[ the black god ] };; lazy fiend child pp \%h;; { jumps => "over" }

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice.
      But the array value doesn't seem to have the same reference address as the hash value. Are you sure that doesn't make a copy?
      my %hash = ('a'=>'test'); print \$hash{'a'}."\n"; my @arr = delete $hash{'a'}; print \$arr[0]."\n";
        Are you sure that doesn't make a copy?

        Yes. Quite sure. I have two independent demonstrations for proof of that.

        1. Using Devel::Peek::Dump():

          This is not a Perl-level 'aliasing' effect; it is a C-level pointer swap, so you're looking at the wrong thing. It isn't the Perl reference values you should be considering, but rather the PV component of the two scalars as shown below.

          Note that whilst the the two scalars have different heads and bodies (the two hex values on the first line of each dump); the address of the actual data on the fourth line of each dump is identical in both:

          use Devel::Peek;; $h{ XXX } = 'test';; Dump $h{ XXX };; SV = PV(0x2ab2a0) at 0x3dc99d8 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x3e28f98 "test"\0 CUR = 4 LEN = 8 $a[0] = delete $h{ XXX };; Dump $a[ 0 ];; SV = PV(0x2ab2b0) at 0x3dc9978 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x3e28f98 "test"\0 CUR = 4 LEN = 8

          Contrast that with a normal assignment where all three hex values in the scalar Dump()s are different:

          use Devel::Peek;; $h{ XXX } = 'test';; Dump $h{ XXX };; SV = PV(0x11b2a0) at 0x3e199d8 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x3e78f98 "test"\0 CUR = 4 LEN = 8 $a[0] = $h{ XXX };; Dump $a[ 0 ];; SV = PV(0x11b2b0) at 0x3e19978 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x3e78f68 "test"\0 CUR = 4 LEN = 8
        2. Using empirical observation.

          In the trace below mem is a function that returns the process memory utilisation in K bytes.

          • When the REPL session has just started, the memory used is just under 10MB.
          • After I create the hash key with a 100e6 byte value, the memory has grown to 107MB.
          • After I transfer that value to the array element, the memory has change by a few kb, but is still basically the same 107MB.
          C:\test>p1 Perl> print mem;; 9,340 K Perl> $h{ XXX } = 'X' x 100e6;; Perl> print mem;; 107,248 K Perl> $a[ 0 ] = delete $h{ XXX };; Perl> print mem;; 107,308 K Perl>

          If the data had been copied, the footprint would have been over 200MB.

          (I cannot Dump() the scalars in this latter case because Dump() would spend a week trying to format a 100e6 byte string nicely, before dumping it to the console, which would take another week (or two:)!)


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: push to array without copying (\@_)
by tye (Sage) on Nov 15, 2016 at 02:50 UTC

    There is one rather ugly trick for getting a reference to an array containing aliases:

    my %hash = ( a => 1, b => 2, c => 3, d => 4 ); my $av = sub { \@_ }->( @hash{'d','b','c','a'} ); print "@$av\n";

    But I'd likely go with BrowserUk's suggestion instead.

    - tye        

      > But I'd likely go with BrowserUk's suggestion instead

      True but since I can't find this "aliasing" documented in delete it has the disadvantage of an implementation detail.

      Though I don't expect it to be changed ever, since it's a safe performance gain.

      edit

      And more importantly the code wouldn't break if this was changed, only slow down.

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

Re: push to array without copying
by chris212 (Scribe) on Nov 15, 2016 at 21:20 UTC
    I found that "printf" was not the most efficient way to format fixed-width data. It was faster to append whitespace padding or truncate with a regex as needed. Probably since it modifies the variable in-place and it doesn't need to parse a format string. This allowed me to try using an array of references, but I saw no performance improvement. I also see no performance improvement deleting the value from the hash when pushing it to an array.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1175917]
Approved by Paladin
Front-paged by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (3)
As of 2022-07-07 00:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?