Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^3: Rosetta Code: Long List is Long -- dualvar

by choroba (Cardinal)
on Dec 02, 2022 at 11:20 UTC ( [id://11148497]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Rosetta Code: Long List is Long -- dualvar
in thread Rosetta Code: Long List is Long

Interesting!

What really surprised me was how slow it became once I tried to sort the data just once:

sort { $b <=> $a || $a cmp $b } @data;
Before:
sort + output: 19 secs
After:
sort + output: 32 secs

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

Replies are listed 'Best First'.
Re^4: Rosetta Code: Long List is Long -- dualvar
by eyepopslikeamosquito (Archbishop) on Dec 04, 2022 at 06:35 UTC

    What really surprised me was how slow it became once I tried to sort the data just once

    Agreed! Not sure if this is a bug or a feature but I found it very surprising! On my machine, when testing Discipulus' superb one liner just now:

    perl -lae "$r{$F[0]}+=$F[1]}{print qq($_\t$r{$_})for sort{$r{$b}<=>$r{ +$a}||$a cmp $b}keys %r" big1.txt big2.txt big3.txt >oneliner.tmp

    I almost fell off my chair when the execution time dropped from 92 seconds to 38 seconds (with identical correct output) simply by changing:

    sort{$r{$b}<=>$r{$a}||$a cmp $b}keys %r
    to:
    sort{$r{$b}<=>$r{$a}}sort{$a cmp $b}keys %r

    After picking myself up off the floor, I ran:

    perl -MO=Terse -e "sort{$r{$b}<=>$r{$a}||$a cmp $b}keys %r" LISTOP (0x299c080) leave [1] OP (0x299c050) enter COP (0x299c0c0) nextstate LISTOP (0x299c120) sort OP (0x299c160) pushmark UNOP (0x2882ed0) null LISTOP (0x2882fb0) scope COP (0x2882ff0) null [195] UNOP (0x2883050) null LOGOP (0x2883088) or BINOP (0x28831e8) ncmp [5] UNOP (0x26610d0) null [150] UNOP_AUX (0x299c010) multideref OP (0x26611b8) null [7] UNOP (0x2883228) null [150] UNOP_AUX (0x299bfd0) multideref OP (0x2661098) null [7] BINOP (0x28830c8) scmp [8] UNOP (0x2883178) null [14] PADOP (0x28831b0) gvsv GV (0x18a8c0) +*a UNOP (0x2883108) null [14] PADOP (0x2883140) gvsv GV (0x2655320) + *b UNOP (0x2882f08) keys [11] UNOP (0x2882f40) rv2hv [10] PADOP (0x2882f78) gv GV (0x2654ae0) *r

    which is 25 OPs ... while:

    perl -MO=Terse -e "sort{$r{$b}<=>$r{$a}}sort{$a cmp $b}keys %r" LISTOP (0x2940c18) leave [1] OP (0x2940c90) enter COP (0x2940bb8) nextstate LISTOP (0x2940b78) sort OP (0x2940c58) pushmark UNOP (0x2990018) null LISTOP (0x2940d38) scope COP (0x2940d78) null [195] BINOP (0x2940dd8) ncmp [5] UNOP (0x2644940) null [150] UNOP_AUX (0x298ff68) multideref OP (0x2644a28) null [7] UNOP (0x2940e18) null [150] UNOP_AUX (0x298ff28) multideref OP (0x2644908) null [7] LISTOP (0x298ffa8) sort OP (0x298ffe8) pushmark UNOP (0x2940ad0) keys [11] UNOP (0x2940b08) rv2hv [10] PADOP (0x2940b40) gv GV (0x26344d8) *r

    and (update):

    perl -MO=Terse -e "sort{$r{$b}<=>$r{$a}}sort keys %r" LISTOP (0x29e2518) leave [1] OP (0x29bda60) enter COP (0x29e2558) nextstate LISTOP (0x29e25b8) sort OP (0x29e25f8) pushmark UNOP (0x29e2628) null LISTOP (0x29e2778) scope COP (0x29e27b8) null [195] BINOP (0x29e2818) ncmp [5] UNOP (0xed31b0) null [150] UNOP_AUX (0x29bda20) multideref OP (0xed3298) null [7] UNOP (0x29e2858) null [150] UNOP_AUX (0x29bd9e0) multideref OP (0xed3178) null [7] LISTOP (0x29e2660) sort OP (0x29e26a0) pushmark UNOP (0x29e26d0) keys [8] UNOP (0x29e2708) rv2hv [7] PADOP (0x29e2740) gv GV (0xec4098) *r

    are just 20 OPs. I'm out of my depth here, so we might need a dave_the_m-class Perl internals guru to get to the bottom of this mystery. :)

    Update: I believe the second example above can be shortened from:

    sort{$r{$b}<=>$r{$a}}sort{$a cmp $b}keys %r
    to simply:
    sort{$r{$b}<=>$r{$a}}sort keys %r
    ...also, for correctness, I think we should further add:
    use sort 'stable';
    at the top of the program because we are relying on a stable sort for this trick to work. Further update: Oops, this is not necessary, didn't read far enough in the sort docs, "The default sort has been stable since v5.8.0, and given this consistent behaviour for almost two decades, everyone has come to assume stability".

    Updated: fixed typo, added -e after -MO=Terse in examples above. Added comment about stable sort.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11148497]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (5)
As of 2024-04-19 15:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found