So I converted a small string/grammar parser, from pure perl, to XS.

I feel your pain.

I was surprised, the old pure perl optree implementation, is only 30% of the speed of XS C code (3/11=%30).

I'm not. Perl does a lot of magic to support all kinds of weird edge cases, whereas XS/C usually does the bare minimum and is compiled/optimized to run directly on your CPU.

A string parser written in C with memcmp() vs PurePerl's eq, 3x slower. Not bad.

memcmp()? That might still be useful in some rare cases where you are dealing with pure ASCII code (e.g. pure english text without any emojis and encoding and stuff). Comparing strings properly in the modern Unicode world takes a lot more processing. You start by decoding UTF-8 (if applicable) to proper code points. Then you normalize your strings to one of the four standardized representations. Then you can compare your strings code point by code point.

Depending on your input text and language and use case, you might also have to use a list of code points or sequences of code points that are equivalent in meaning for your context. For example, in measurement units you have the sign for "micro". So, for the unit "micro ampere", you can have the micro sign, followed by the letter "A", but there is also a combined sign for micro ampere. But since those may be hard to type (or even impossible) on many keyboard setups and/or websites or applications, it is common to just use the small letter "u" followed by a capital "A", e.g. "uA". If you have to deal with something like this, you will have to hand-code that, too, which will also make your code slower.

Those special cases can be handled in C, and there are many application that do. The real question here is if the benefit of the speed increase is worth the extra work hours. If you are running a big social media site where saving a fraction of a percent of CPU time lowers your operating costs, then the answer is yes. On the other hand, if this runs the monthly inventory report for a small business, it's probably more cost efficient to hack together some Perl regexes and invest in a faster server than it is to paying someone to hand-code that stuff in C.

I really don't want to put a damper of your spirit. The pursuit of more efficient code is a worthwhile goal! For all of us developers! But when it comes to coding your own solution for modern text processing (or date/time calculations for that matter), you always risk opening Pandoras box.

PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
Also check out my sisters artwork and my weekly webcomics

In reply to Re: benchmarks, PurePerl vs Perl XS, Only!!! 3x slower, PerlXS C vs Real C, 4x slower by cavac
in thread benchmarks, PurePerl vs Perl XS, Only!!! 3x slower, PerlXS C vs Real C, 4x slower by bulk88

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.