Re^3: efficient perl code to count, rank (B-tree)(update)

> How would a database compensate for this?

Databases use structures like B-trees for indexing and sorting and are pretty good in balancing these trees between RAM and disk. °

They optimize the trade-off between memory and time.

Perl is pretty bad in sorting linear stuff which is part in RAM and part on disk.

(Well you could tie an AoA to a file representing a table. Sorting that array would go without much RAM but mean constant overhead with each access. This is certainly worth a shot, but as I said DBs have this already optimized.)

Cheers Rolf
_{(addicted to the Perl Programming Language :)

Wikisyntax for the Monastery}

update

°) from WP B-tree

Unlike other self-balancing binary search trees, the B-tree is well suited for storage systems that read and write relatively large blocks of data, such as disks ...

... for the purpose of efficiently managing index pages for large random access files. The basic assumption was that indexes would be so voluminous that only small chunks of the tree could fit in main memory.

Comment on Re^3: efficient perl code to count, rank (B-tree)(update)

Replies are listed 'Best First'.
Re^4: efficient perl code to count, rank (merge sort) by LanX (Saint) on Jul 18, 2021 at 12:36 UTC
> (Well you could tie an AoA to a file representing a table. Sorting that array would go without much RAM but mean constant overhead with each access. This is certainly worth a shot, but as I said DBs have this already optimized.) I don't think this will work because of the way merge sort is implemented in Perl It is first comparing all pairs in a breadth first search, which would mean far too many disk operations and no sensible caching option `DB<7> sort { print "$a,$b \|"; $a <=> $b } reverse 1..16 16,15 \|14,13 \|12,11 \|10,9 \|8,7 \|6,5 \|4,3 \|2,1 \|15,13 \|15,14 +\|11,9 \|11,10 \|13,9 \|13,10 \|13,11 \|13,12 \ \|7,5 \|7,6 \|3,1 \|3,2 \|5,1 \|5,2 \|5,3 \|5,4 \|9,1 \|9,2 \|9,3 \|9,4 + \|9,5 \|9,6 \|9,7 \|9,8 \| DB<8>` [download] explained `16,15 \|14,13 \|12,11 \|10,9 \|8,7 \|6,5 \|4,3 \|2,1 \| + # sort pairs 15,13 \|15,14 \|11,9 \|11,10 + # sort 1st and 2nd 4-tuple \|13,9 \|13,10 \|13,11 \|13,12 + # sort 1st 8 tuple \|7,5 \|7,6 \|3,1 \|3,2 + # sort 3rd and 4th 4-tuple \|5,1 \|5,2 \|5,3 \|5,4 + # sort 2nd 8tuple \|9,1 \|9,2 \|9,3 \|9,4 \|9,5 \|9,6 \|9,7 \|9,8 \| + # sort 16 tuple` [download] A caching/memoizing of data read from disk would be far more efficient if the chunks were strictly chosen depth first. FWIW WP lists some divide-and-conquer approaches for merge-sort https://en.wikipedia.org/wiki/Mergesort#Parallel_merge_sort Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^4: efficient perl code to count, rank (merge sort)
by LanX (Saint) on Jul 18, 2021 at 12:36 UTC

> (Well you could tie an AoA to a file representing a table. Sorting that array would go without much RAM but mean constant overhead with each access. This is certainly worth a shot, but as I said DBs have this already optimized.)

I don't think this will work because of the way merge sort is implemented in Perl

It is first comparing all pairs in a breadth first search, which would mean far too many disk operations and no sensible caching option

  DB<7> sort { print "$a,$b  |"; $a <=> $b } reverse 1..16
16,15  |14,13  |12,11  |10,9  |8,7  |6,5  |4,3  |2,1  |15,13  |15,14  
+|11,9  |11,10  |13,9  |13,10  |13,11  |13,12  \
|7,5  |7,6  |3,1  |3,2  |5,1  |5,2  |5,3  |5,4  |9,1  |9,2  |9,3  |9,4
+  |9,5  |9,6  |9,7  |9,8  |
  DB<8>
[download]

explained

16,15  |14,13     |12,11  |10,9     |8,7  |6,5    |4,3  |2,1  |       
+      # sort pairs
15,13  |15,14     |11,9  |11,10                                       
+      # sort 1st and 2nd 4-tuple
|13,9  |13,10    |13,11  |13,12                                       
+      # sort 1st 8 tuple
                                    |7,5  |7,6    |3,1  |3,2          
+      # sort 3rd and 4th 4-tuple
                                    |5,1  |5,2  |5,3  |5,4            
+      # sort 2nd 8tuple

      |9,1  |9,2  |9,3  |9,4  |9,5  |9,6  |9,7  |9,8  |               
+      # sort 16 tuple
[download]

A caching/memoizing of data read from disk would be far more efficient if the chunks were strictly chosen depth first.

FWIW WP lists some divide-and-conquer approaches for merge-sort

https://en.wikipedia.org/wiki/Mergesort#Parallel_merge_sort

Cheers Rolf
_{(addicted to the Perl Programming Language :)

Wikisyntax for the Monastery}

[reply]
[d/l]
[select]