http://qs1969.pair.com?node_id=11138685


in reply to Anyone with XS experience willing to create a high performance data type for Perl?

What are your actual requirements? Do you need a balanced tree, or on-disk storage or what? For example could you just use a perl hash and track min/max key values during insertion? Or just from time to time process a sorted list of the hash keys? The following trivial code on my laptop takes 1.2s to create the hash and a further 1.2s to find the current smallest and largest keys.
my %data; for (1..2_000_000) { $data{$_} = "some random text $_"; } my ($min, $max) = (sort { $a <=> $b } keys %data)[0,-1];

Dave.

  • Comment on Re: Anyone with XS experience willing to create a high performance data type for Perl?
  • Download Code

Replies are listed 'Best First'.
Re^2: Anyone with XS experience willing to create a high performance data type for Perl?
by beautyfulman (Sexton) on Nov 10, 2021 at 16:59 UTC
        So again, tell us what you're trying to achieve.

        It is exceedingly unlikely that the outcome of this thread is that someone will take it upon themselves to put a fast btree implementation on CPAN which meets all your requirements. What might reasonably be an outcome is that someone will advise you of a way of achieving your requirement in a reasonably speedy way, e.g. by using CPAN module X or perl technique Y or interesting algorithm Z. But only if we know what you want.

        Dave.

          > Ideally need a balanced tree like Red Black or AVL, the Perl hash takes 4 seconds

          Not surprisingly, the Perl hash, at 4 seconds, is a lot faster than the (pure Perl) Tree::RB, at 33 seconds, when running your (unpublished) benchmark.

          Is there any functional reason why you can't just use a hash?

          Since you mentioned AVL, have you tried AVLTree from CPAN? It uses a XS wrapper around Julienne Walker's AVL Tree C library, so should be a lot faster than Tree::RB.

          References

          Disclaimer: I have no experience using any of these, just did a quick google for you.

            It is exceedingly unlikely that the outcome of this thread is that someone will take it upon themselves to put a fast btree implementation on CPAN which meets all your requirements.

            Seriously! I mean you'd need someone with XS expertise who also happens to know about Red/Black trees, and who reads this forum. And unless you're paying them it would need to be a weird masochist who actually enjoys XS, or maybe someone who finds systems-level work cathartic after a long multi-week stretch of web-app tedium. Or maybe if they just really love the Red/Black algorithm and had their own Red/Black implementation in C that they've been porting from project to project since college. Maybe if they had finished a big refactor of the code a few months ago but hadn't gotten to use it in a project yet, that could be enough motivation.

            Congratulations, the stars aligned.

            Here's your module. Tree-RB-XS-0.00_01.tar.gz
            (not visible on metacpan.org yet)

            You get the distinct privilege of reporting the first bugs in a recently-refactored pointer-math-heavy C library wrapped with some creative new XS ideas.

            Update: I added a new KEY_TYPE_BSTR that copies the keys from Perl into plain buffers, and uses memcmp on them. It's a good percentage faster at the expense of incorrect unicode sorting. It's useful if your strings are ASCII.

            Update: I finished off most of the Tree::RB API, polished up the documentation, and gave it an official release. It also now has custom XS compare functions to choose from, such as CMP_NUMSPLIT, which can handle the fairly common scenario of a mixture of numbers and strings in the key.