punkish has asked for the wisdom of the Perl Monks concerning the following question:

The usual wisdom is -- if you want to do something fast, write it in C. Hence the plethora of XS modules, and the usual annotation: "The heart of blah blah is written in C for decent speed."

Of course, Perl itself is written in C, so I ask this -- other than the benefits that an interpreted language brings -- rapid prototyping and development -- once I have got my script working, why is it that I can't compile it and get the same level of speed that a native C program would get? Is there a technical reason, a philosophical reason, or is it because doing so would break a law of physics? And, no, running perl scripts inside mod_perl is not a solution here. I am wondering why is it that I can't compile perl scripts to honest-to-goodness-bare-metal-programs-such-as-those-coming-out-of-the-other-end-of-gcc? Wouldn't that be the best of both worlds? I would be able to program faster in Perl and be able to write faster programs in Perl.

Bonus question -- I often read arguments for and against strict typing vs. no typing as in Perl. Other than possibly catching errors and thereby making for more robust programs, strict typing doesn't by itself contribute to any speed gains, does it? It seems like just an excuse for macho programmers to snigger at sissy programmers such as me?

--

when small people start casting long shadows, it is time to go to bed
  • Comment on compiling perl scripts aka why is perl not as fast as C

Replies are listed 'Best First'.
Re: compiling perl scripts aka why is perl not as fast as C
by ikegami (Patriarch) on Mar 21, 2010 at 05:42 UTC

    What makes development fast is the use of the flexible scalar type. What makes the code "slow" is the use of the flexible scalar type.

    strict typing doesn't by itself contribute to any speed gains, does it?

    Actually, it has everything to do with it. int y = x; is two machine opcodes, and the optimiser can reduce the number of opcodes the next statement takes if it uses x or y. On the other hand, my $y = $x; require memory allocation, type checks, magic checks, and support to copy the string, the integer, the float and/or whatever else might be contained by $x.

      Fine then. We, the good programming citizens, are already used to declaring our variables with use strict, so tighten it even more by making us declare their types as well as required, but also make it possible to compile the program so it doesn't have to be interpreted any more, and endow it with whatever magic required to make it as fast as C. Why that is not possible is what I am trying to understand.
      --

      when small people start casting long shadows, it is time to go to bed

        so tighten it even more by making us declare their types as well as required,

        I said above "What makes development fast is the use of the flexible scalar type". So you'd lose that, it seems to me. You'd lose a lot more, in fact. The stuff that makes Perl Perl. Even basic stuff like returning lists from functions would become (a lot?) more complicated.

        so it doesn't have to be interpreted any more,

        Perl isn't interpreted. Perl programs are compiled into a list of native function calls. Perl just calls one function after another:

        while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) { PERL_ASYNC_CHECK(); }

        And soon, probably

        while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) {}

        (PERL_ASYNC_CHECK checks if a signal was received. A proof of concept was given that shows that moving it can completely eliminate the cost of checking for signals.)

        endow it with whatever magic required to make it as fast as C

        There is already a programming language with all that magic - C. You have the choice, C is fast and lean, it is actually a very small language. Then again, ever tried using regular expressions in it, or string handling? You can do those things, but parts of your body will start to decompose as you do so.

        Perl can trace its lineage back to AWK - Aho, Weinburger, and Kerningham. The same Brian Kerningham who worked on C (with Denis Richie). He recognised that C wasn't good at everything, so we got AWK, but he never intended it to replace C, it is just another tool in your toolkit. Selecting the right one is the trick.
        > Why that is not possible is what I am trying to understand.

        I think what you want could be done is something like a cerl corresponding to cython, such that you can write and embed (statical typed!!!) C but using Perl syntax.

        Just for time critical inline snippets.

        But while the WP page claims fantastical speed-gains I just read an interview with Guido van Rossum stating that for unpolished general code the speed gain diminishes to only about 50%...

        Wow, indeed worth the trouble ... ;-)

        And Python is known to have a much slimmer implementation than Perl, so try to figure out where the gain will be here... !

        A general solution can be rather found in JIT Just-in-time compilation, just check for the recent speed gains achieved for JS.

        But please note: JS is even much slimmer than Python!

        Cheers Rolf

Re: compiling perl scripts aka why is perl not as fast as C
by moritz (Cardinal) on Mar 21, 2010 at 08:26 UTC
    I often read arguments for and against strict typing vs. no typing as in Perl.

    Let me clarify that Perl does have static typing - it is implied by sigils, and works at compile time. Any expression that begins with @ returns a list or an Array, for example.

    However instead of type check failures, you get coercions in many places, which is why you often don't recognize them as types.

    Other than possibly catching errors and thereby making for more robust programs, strict typing doesn't by itself contribute to any speed gains, does it?

    Sure it does. The more a compiler knows about a program, the more it can optimize. Types are just one of those possible informations.

    Also in Perl 6 types are very important for extensibility: you can write multi routines with the same name, where each of these single routines only accepts certain types. That means that you can "overload" those names for custom types. For example

    class TurkishStr is Str { ... } multi sub uc(TurkishStr $x) { # handle upper-casing i to I with dot above somehow } # now you can write say uc $string; # and it will work both for ordinary stringsand for Turkish Strings.
    It seems like just an excuse for macho programmers to snigger at sissy programmers such as me?

    Never attribute to malice what may in fact just be your own state of not being well informed.

    Perl 6 - links to (nearly) everything that is Perl 6.

      Let me clarify that Perl does have static typing - it is implied by sigils,>

      So you define Perl5's types as being scalar, array, hash and few others.

      However instead of type check failures, you get coercions in many places,

      No, no coercions occur between any of of those types. There's only coercion if you consider Perl to have dynamic typing (signed integer, unsigned integer, UTF8=0 string, UTF8=1 string, float, regex, IO, etc) in addition to static typing.

      The more a compiler knows about a program, the more it can optimize.

      Indeed, and Perl5 doesn't have that much information available at compile time. Yeah, you can consider scalar to be a static type, but considering all subs take a list of scalars for argument, it's a rather useless type from the perspective of optimising sub calls.

      Talking of static vs dynamic typing sounds a lot like talking about strong vs weak typing.

        So you define Perl5's types as being scalar, array, hash and few others

        Right, these are the user exposed types; strings and integers (PV/IV) are hidden from the user as much as possible, so I don't see them as types in Perl 5. (This attitude is influenced by me not writing any XS code, I think).

        No, no coercions occur between any of of those types.
        my $scalar = @array; # or my @array = $scalar;

        Is that no coercion?

        Talking of static vs dynamic typing sounds a lot like talking about strong vs weak typing.

        In my book there's a difference: For example the C programming language has static typing (determined at compile time), but weak (you can cast anything to anything else even if it doesn't really makes sense).

        But I'm guess I use these words not in the same sense as everybody else does, nor is there any shared consensus on what these things actually mean.

        Perl 6 - links to (nearly) everything that is Perl 6.
Re: compiling perl scripts aka why is perl not as fast as C
by GrandFather (Saint) on Mar 21, 2010 at 08:52 UTC

    At the very least compiling Perl to machine code as is done for C would preclude using string eval. But even if you accepted that as a reasonable limitation, Perl's type system and DWIMery mean that there is a degree of run time overhead that simply can't be removed. Altering Perl to remove that overhead would alter the language - it wouldn't be Perl any more and most of what would go would be exactly those elements that make it fast to develop in.

    In fact for the core operations that Perl is tuned for (string and array manipulation) execution speed is generally quite acceptable. The trick is to code so that most of the work is done inside Perl constructs such as regular expressions, string functions and array manipulation functions.

    Compilers for languages such as C++ use the information that strong typing provides to optimise code - often spending much more time in the optimisation phases than the basic compilation phases to achieve good results. Strong typing provides strong constraints on how code is written which allows optimisers to work well, but can really slow down code development. Writing the code becomes much more fussy!

    At the end of the day you choose the language that best fits your skill set and is most applicable to the task at hand.


    True laziness is hard work
Re: compiling perl scripts aka why is perl not as fast as C
by ikegami (Patriarch) on Mar 21, 2010 at 06:41 UTC
    Of course, the real answer is: Because those who might have an itch to make Perl5 do X (like you) aren't working to make it do X (or haven't gotten there yet).
Re: compiling perl scripts aka why is perl not as fast as C
by JavaFan (Canon) on Mar 21, 2010 at 12:00 UTC
    why is it that I can't compile it and get the same level of speed that a native C program would get?
    Because compiling still doesn't give you the speed C gives. Take for instance strings. In C, strings are just pointers to arrays with numbers. You cannot, in a single operation, add to a string. And it's even a lot harder to have all other references to such a string see the change.

    In Perl, this is a lot easier. I can easily add to a string. If there's a reference to the string I'm modifying, the reference will see the change. All this makes that Perl is a lot slower than C - it's all the goodies that Perl gives to the programmer that makes it slower, that has a much bigger impact that the difference between compiled and not-compiled.

    If you need speed, by all means, use C. If you are willing to sacrifice speed for programmer niceties, you have the option to use Perl.

      I totally agree.

      I don't think that there is a reason to write in C nowadays except in embedded systems where real time speed is crucial.C was being used because its structures map closely to the hardware.

      Perl uses references which disallow pointer arithmetic or pointer de-referencing which of course are slower in operation than pointers, but at the same time you don't get involved with allocating and de-allocating memory or playing directly with the hardware generating sigfaults easily.

      Other issues that arise from C's non-restriction and flexibility are security oriented such as buffer overflows and string formatting and that is a reason why "plug-ins" libraries like The Safe C Library were designed to patch/correct.

      Actually quoting the Safe C String Library overt security goals "The API should be capable of tracking whether strings are "trusted", a la Perl's taint mode.";even they recognize Perl's value!

      This kind of quest for speed would us lead back to assembly which is faster than C. The question is would you like to deal with assembly? as C is a higher level that assembly, Perl is higher level than C.

      It' more a question of what you want to TRADE for speed. The industry is constantly trading speed for flexibility and safety. That is why the trend is oriented on Virtual Machines like Java Virtual Machine or the .NET CLR which are layers on top of the OS and protect you from those kind of issues.

      But if one wanted to find out what could be done to bring Perl closer to C he could take a look at Why Not Translate Perl to C? by Mark-Jason Dominus.

      The comparison would be better to C++, which is more similar to perl in "high-level" functionality (like adding strings) -- so what (if any) are the factors that prevent perl being compiled into a form that runs as fast as compiled C++?
Re: compiling perl scripts aka why is perl not as fast as C
by 7stud (Deacon) on Mar 21, 2010 at 08:53 UTC
    Instead of trying to turn perl into C, why don't you just learn C? More flexible programming comes at a cost--that is typically speed. The cost of less flexible languages is that they are harder to learn, and they can be very buggy when you don't do things right.

      Define "flexible" in this context. C is a much simpler language than Perl and much easier to learn. C is however much harder to use to do simple things that in Perl you take for granted, like using the contents of a variable as a character, string, integer, float, double, pointer, ... as suits the current purpose.

      Programs written in any language are buggy when you don't do things right. Strongly typed languages tend to catch more silly errors at compile time where languages like Perl tend to catch those errors at run time (if at all). On the other hand languages like C allow you to more easily generate much nastier classes of bugs than languages like Perl (bad pointers tromping over memory for example).

      It seems to me the reason there is a much stronger ethos of unit testing in the Perl community than in the C community is that the balance between run time and compile time errors is quite different between the languages. Perl requires run time testing to shake out the silly typo type errors that in C are found at compile time. My guess is that per line of code typical Perl tends to pack in more bugs than typical C, just because Perl can do a heck of a lot more work in one line than C can.


      True laziness is hard work
        C is a much simpler language than Perl and much easier to learn.

        That vastly depends on what you mean by learning a language, and how familiar the potential programmer is with low level concepts like computer memory.

        If she is an experienced assembler programmer, C is probably easier to learn than Perl. If she has no clue about memory, pointers, segmentation faults and the like, Perl is easier to learn.

        Also it depends on how you count: it might be easier to learn 90% of the features of the C programming language than learning 90% of the features of Perl. But the difference is that with 20% of Perl features you can already achieve a whole lot of stuff - but not with C.

        So if you count "learn enough of a language to get stuff done", I disagree that C is easier to learn than Perl.

        Perl 6 - links to (nearly) everything that is Perl 6.
        C is a much simpler language than Perl and much easier to learn.

        Surely you must be joking.

        I can teach a programming novice how to write a short program which performs IO in an hour, and he or she has a better than decent chance of remembering how things work with a notecard or two of notes.

        I can probably walk the same novice through compiling and running "Hello, world!" in that time and he or she might remember how things work.

        It seems to me the reason there is a much stronger ethos of unit testing in the Perl community than in the C community is that the balance between run time and compile time errors is quite different between the languages.

        No, it's because it's immensely easier to write tests in Perl than in C.

        My guess is that per line of code typical Perl tends to pack in more bugs than typical C, just because Perl can do a heck of a lot more work in one line than C can.

        I've heard the opposite, because Perl requires far fewer lines of code than the corresponding C. The bug defect rate tends to be constant per SLOC.