So I converted a small string/grammar parser, from pure perl, to XS.
I feel your pain.
I was surprised, the old pure perl optree implementation, is only 30% of the speed of XS C code (3/11=%30).
I'm not. Perl does a lot of magic to support all kinds of weird edge cases, whereas XS/C usually does the bare minimum and is compiled/optimized to run directly on your CPU.
A string parser written in C with memcmp() vs PurePerl's eq, 3x slower. Not bad.
memcmp()? That might still be useful in some rare cases where you are dealing with pure ASCII code (e.g. pure english text without any emojis and encoding and stuff). Comparing strings properly in the modern Unicode world takes a lot more processing. You start by decoding UTF-8 (if applicable) to proper code points. Then you normalize your strings to one of the four standardized representations. Then you can compare your strings code point by code point.
Depending on your input text and language and use case, you might also have to use a list of code points or sequences of code points that are equivalent in meaning for your context. For example, in measurement units you have the sign for "micro". So, for the unit "micro ampere", you can have the micro sign, followed by the letter "A", but there is also a combined sign for micro ampere. But since those may be hard to type (or even impossible) on many keyboard setups and/or websites or applications, it is common to just use the small letter "u" followed by a capital "A", e.g. "uA". If you have to deal with something like this, you will have to hand-code that, too, which will also make your code slower.
Those special cases can be handled in C, and there are many application that do. The real question here is if the benefit of the speed increase is worth the extra work hours. If you are running a big social media site where saving a fraction of a percent of CPU time lowers your operating costs, then the answer is yes. On the other hand, if this runs the monthly inventory report for a small business, it's probably more cost efficient to hack together some Perl regexes and invest in a faster server than it is to paying someone to hand-code that stuff in C.
I really don't want to put a damper of your spirit. The pursuit of more efficient code is a worthwhile goal! For all of us developers! But when it comes to coding your own solution for modern text processing (or date/time calculations for that matter), you always risk opening Pandoras box.
In reply to Re: benchmarks, PurePerl vs Perl XS, Only!!! 3x slower, PerlXS C vs Real C, 4x slower
by cavac
in thread benchmarks, PurePerl vs Perl XS, Only!!! 3x slower, PerlXS C vs Real C, 4x slower
by bulk88
For: | Use: | ||
& | & | ||
< | < | ||
> | > | ||
[ | [ | ||
] | ] |