Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: Anyone with XS experience willing to create a high performance data type for Perl?

by beautyfulman (Sexton)
on Nov 10, 2021 at 16:59 UTC ( [id://11138694]=note: print w/replies, xml ) Need Help??


in reply to Re: Anyone with XS experience willing to create a high performance data type for Perl?
in thread Anyone with XS experience willing to create a high performance data type for Perl?

Replies are listed 'Best First'.
Re^3: Anyone with XS experience willing to create a high performance data type for Perl?
by dave_the_m (Monsignor) on Nov 10, 2021 at 17:58 UTC
    So again, tell us what you're trying to achieve.

    It is exceedingly unlikely that the outcome of this thread is that someone will take it upon themselves to put a fast btree implementation on CPAN which meets all your requirements. What might reasonably be an outcome is that someone will advise you of a way of achieving your requirement in a reasonably speedy way, e.g. by using CPAN module X or perl technique Y or interesting algorithm Z. But only if we know what you want.

    Dave.

          I will try one last time. What I mean by "tell us what you want to achieve" is a description like the following. Talking about inserts and deletes is describing an implementation of what you're trying to achieve. At the moment we have ABSOLUTLY NO IDEA what your problem is, because you haven't even hinted at it yet.

          So here's a compelely hypothetical example just to give you an idea of what sort of description I'm after.

          "The program must process a large set of alphanumeric identifier tags, each mapped to a name. The program runs entirely in memory; it doesn't store permanent state to a file or database. The program reads in identifier / name pairs from a file or network socket; on startup there will be an initial "surge" of about a million records, then new records will appear at a slower rate. The program keeps the identifier/name pairs in memory. From time to time, queries will be received on a second socket; this may involve looking up a particular identifier, or returning a range of identifier values (the identifiers have batch numbers incorporated into them and sometimes you want to return all identifiers from that batch). Once an hour, the program will delete all identifiers whose batch number is more than 100 behind the most recent batch number."

          Dave.

          > I can switch languages if i need to. If you know of something Perl that can do it as fast as Ruby/C rbtree gem i will be very glad to use Perl.

          I have to admit your overall approach feels completely alien to me. :) Is this a personal hobby project? Or for work? - if so, how many programmers at your company?

          I don't switch languages lightly because it's not practical to maintain sharp skills across many different programming languages at the same time. In any case, I'd need to get approval from the other programmers in my team before doing so.

          And I don't introduce dependencies lightly. What if your dependent module (or any of its dependencies) has a security vulnerability? What if its author abandons it? What if its license restricts you? How quickly can you isolate/troubleshoot a bug in its code? What if you later need to port your software to a platform (or a different Perl version) that your chosen 3rd party library does not support?

          I tend to take a much more conservative approach. I'd start by solving the problem, in an easy to understand and maintain style, in plain Perl. If that was fast enough, problem solved. If not, I'd benchmark it to find where the bottleneck is and consider the options, including introducing third-party dependencies. Since both Perl and C++ are approved languages at work, I'd even consider writing the whole thing in plain portable ANSI C++ (again without third-party dependencies). Note, btw, that C++ std::map is usually implemented using red-black trees.

          For some background on where I'm coming from see: Why Create Coding Standards and Perform Code Reviews?

    Re^3: Anyone with XS experience willing to create a high performance data type for Perl?
    by eyepopslikeamosquito (Archbishop) on Nov 11, 2021 at 23:22 UTC

      > Ideally need a balanced tree like Red Black or AVL, the Perl hash takes 4 seconds

      Not surprisingly, the Perl hash, at 4 seconds, is a lot faster than the (pure Perl) Tree::RB, at 33 seconds, when running your (unpublished) benchmark.

      Is there any functional reason why you can't just use a hash?

      Since you mentioned AVL, have you tried AVLTree from CPAN? It uses a XS wrapper around Julienne Walker's AVL Tree C library, so should be a lot faster than Tree::RB.

      References

      Disclaimer: I have no experience using any of these, just did a quick google for you.

      Re^3: Anyone with XS experience willing to create a high performance data type for Perl?
      by NERDVANA (Deacon) on Nov 12, 2021 at 20:41 UTC
        It is exceedingly unlikely that the outcome of this thread is that someone will take it upon themselves to put a fast btree implementation on CPAN which meets all your requirements.

        Seriously! I mean you'd need someone with XS expertise who also happens to know about Red/Black trees, and who reads this forum. And unless you're paying them it would need to be a weird masochist who actually enjoys XS, or maybe someone who finds systems-level work cathartic after a long multi-week stretch of web-app tedium. Or maybe if they just really love the Red/Black algorithm and had their own Red/Black implementation in C that they've been porting from project to project since college. Maybe if they had finished a big refactor of the code a few months ago but hadn't gotten to use it in a project yet, that could be enough motivation.

        Congratulations, the stars aligned.

        Here's your module. Tree-RB-XS-0.00_01.tar.gz
        (not visible on metacpan.org yet)

        You get the distinct privilege of reporting the first bugs in a recently-refactored pointer-math-heavy C library wrapped with some creative new XS ideas.

        Update: I added a new KEY_TYPE_BSTR that copies the keys from Perl into plain buffers, and uses memcmp on them. It's a good percentage faster at the expense of incorrect unicode sorting. It's useful if your strings are ASCII.

        Update: I finished off most of the Tree::RB API, polished up the documentation, and gave it an official release. It also now has custom XS compare functions to choose from, such as CMP_NUMSPLIT, which can handle the fairly common scenario of a mixture of numbers and strings in the key.

      Log In?
      Username:
      Password:

      What's my password?
      Create A New User
      Domain Nodelet?
      Node Status?
      node history
      Node Type: note [id://11138694]
      help
      Chatterbox?
      and the web crawler heard nothing...

      How do I use this?Last hourOther CB clients
      Other Users?
      Others imbibing at the Monastery: (4)
      As of 2024-03-29 00:49 GMT
      Sections?
      Information?
      Find Nodes?
      Leftovers?
        Voting Booth?

        No recent polls found