Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Anyone with XS experience willing to create a high performance data type for Perl?

by talexb (Chancellor)
on Nov 10, 2021 at 01:38 UTC ( #11138658=note: print w/replies, xml ) Need Help??


in reply to Anyone with XS experience willing to create a high performance data type for Perl?

    For Perl kind of performance Tree:RB is ultra fast but look at Ruby's standard red and black tree, it is several orders of magnitude faster. The Ruby gem rbtree is just C code bind for dict.c, you can download it here https://rubygems.org/gems/rbtree Inside the gem file there is dict.c and dict.h, as well as ruby.c which is the wrap.

If Tree::RB takes 33 seconds and the Ruby gem rbtree takes 2 seconds, then it's an exaggeration to say it's 'several orders of magnitude faster'; based on those numbers, it's one order of magnitude faster.

And if the Ruby solution is just C code that you can download, how about just using Inline::C to wrap that code and use it as is.

However, I have to ask my favourite question, which is "What problem are you trying to solve?" Why are you looking for something that sorts stuff? Is there any way you could put the data you have into a database, and have it deal with whatever operation you're trying to do?

The beauty of using a database is that lots and lots of hard work has already been done on handling millions of rows efficiently; handling filtering, sorting and selection of those rows is what a database does. You can also have multiple tasks that add data, grab chunks of it, and delete stuff that's been processed.

Alex / talexb / Toronto

Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

  • Comment on Re: Anyone with XS experience willing to create a high performance data type for Perl?

Replies are listed 'Best First'.
Re^2: Anyone with XS experience willing to create a high performance data type for Perl?
by perlfan (Vicar) on Nov 10, 2021 at 10:41 UTC
    > then it's an exaggeration to say it's 'several orders of magnitude faster'; based on those numbers, it's one order of magnitude faster.

    I used to confuse orders of magnitude with times for a long time, until I realized the latter former referred to the exponent (usually applied to 10). Nonetheless, the type of hubris in this post begs for master of these kinds of topics. Perhaps OP is willing to contract someone to do this, which I think would be the appropriate tact. That said, the initial step of using Inline::C you suggested I think is the right one. For additional speed up, if the algorithm is parallelizable, is to introduce OpenMP, which is a lot more straightforward with Alien::OpenMP (and OpenMP::Environment for good measure).

    Updated thanks to note by jdporter.

      An order of magnitude refers to the difference in exponent (or that's what I learned). To take an extreme case, 10 and 99 (1E1 and 9.9E1) are the same order of magnitude, but 99 and 100 (9.9E1 and 1E2) are different orders of magnitude.

      And sure, a 10x difference is big, but all of these things are relative. Unless you're working at FAANG scale, this may not add up to much. So your job runs for 10 seconds instead of a second? Meh. OK. Ten minutes instead of a minute? OK. Ten hours instead of one hour? Hmm .. that does take a while. Maybe don't run it every hour.

      Alex / talexb / Toronto

      Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

        10 and 99 (1E1 and 9.9E1) are the same order of magnitude, but 99 and 100 (9.9E1 and 1E2) are different orders of magnitude

        No. That's the problem with taking such a definition literally. The "order of magnitude" of a number is its base-10 logarithm, usually rounded to the nearest whole number.

        Try this: round( log10(X) - log10(Y) )

        In terms of "order of magnitude", 99 and 100 are the same number.

        I reckon we are the only monastery ever to have a dungeon staffed with 16,000 zombies.
Re^2: Anyone with XS experience willing to create a high performance data type for Perl?
by beautyfulman (Sexton) on Nov 10, 2021 at 02:03 UTC
    Ok for the "orders of magnitude" but that is a huge difference.
    In my case the data type is to consume Kafka messages and process it in real time. A Red Black Tree is the most often used data type for that, a file based DB is not suitable for that.

      OK -- we have a tiny bit more information, but I still don't know why a database isn't a suitable solution for whatever problem you're trying to solve. Like I said, database developers have spent decades making sure their platform is stable and efficient. Perhaps I should ask this a different way -- is there a reason why a database wouldn't work for you? It's still not clear what the problem is.

      Alex / talexb / Toronto

      Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11138658]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (6)
As of 2022-01-20 11:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (56 votes). Check out past polls.

    Notices?