in reply to Re^2: efficient perl code to count, rank
in thread efficient perl code to count, rank

> How would a database compensate for this?

Databases use structures like B-trees for indexing and sorting and are pretty good in balancing these trees between RAM and disk. °

They optimize the trade-off between memory and time.

Perl is pretty bad in sorting linear stuff which is part in RAM and part on disk.

(Well you could tie an AoA to a file representing a table. Sorting that array would go without much RAM but mean constant overhead with each access. This is certainly worth a shot, but as I said DBs have this already optimized.)

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

update

°) from WP B-tree

  • Comment on Re^3: efficient perl code to count, rank (B-tree)(update)

Replies are listed 'Best First'.
Re^4: efficient perl code to count, rank (merge sort)
by LanX (Saint) on Jul 18, 2021 at 12:36 UTC
    > (Well you could tie an AoA to a file representing a table. Sorting that array would go without much RAM but mean constant overhead with each access. This is certainly worth a shot, but as I said DBs have this already optimized.)

    I don't think this will work because of the way merge sort is implemented in Perl

    It is first comparing all pairs in a breadth first search, which would mean far too many disk operations and no sensible caching option

    DB<7> sort { print "$a,$b |"; $a <=> $b } reverse 1..16 16,15 |14,13 |12,11 |10,9 |8,7 |6,5 |4,3 |2,1 |15,13 |15,14 +|11,9 |11,10 |13,9 |13,10 |13,11 |13,12 \ |7,5 |7,6 |3,1 |3,2 |5,1 |5,2 |5,3 |5,4 |9,1 |9,2 |9,3 |9,4 + |9,5 |9,6 |9,7 |9,8 | DB<8>

    explained

    16,15 |14,13 |12,11 |10,9 |8,7 |6,5 |4,3 |2,1 | + # sort pairs 15,13 |15,14 |11,9 |11,10 + # sort 1st and 2nd 4-tuple |13,9 |13,10 |13,11 |13,12 + # sort 1st 8 tuple |7,5 |7,6 |3,1 |3,2 + # sort 3rd and 4th 4-tuple |5,1 |5,2 |5,3 |5,4 + # sort 2nd 8tuple |9,1 |9,2 |9,3 |9,4 |9,5 |9,6 |9,7 |9,8 | + # sort 16 tuple

    A caching/memoizing of data read from disk would be far more efficient if the chunks were strictly chosen depth first.

    FWIW WP lists some divide-and-conquer approaches for merge-sort

    https://en.wikipedia.org/wiki/Mergesort#Parallel_merge_sort

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery