Re: Why is this code so much slower than the same algorithm in C?

Let's look at what is really going on when these programs run.

First the C program. It translates into the following assembler:
<Reveal this spoiler or all spoilers in this node or all in this thread>

Now the Perl code. It translates into the following ... um...I terrified to apply a term to this because one or more of the local language laywers is going to leap all over what ever term I use to say that it isn't really 'term' because blah, blah, ... more irrelavancies, ... blah!

So, whatever terms suits you, which may include one of the following: <opcodes|bytecode|syntax tree|abstract syntax tree|other>.

(Dodge issue!) This is the output from perl -MO=Concise YourScript.pl:

1k <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter ->2
2     <;> nextstate(main 1 junk2.pl:2) v ->3
6     <@> list vKPM/128 ->7
3        <0> pushmark vM/128 ->4
4        <0> padsv[$i:1,8] vM/LVINTRO ->5
5        <0> padsv[$j:1,8] vM/LVINTRO ->6
7     <;> nextstate(main 7 junk2.pl:13) v ->8
a     <2> sassign vKS/2 ->b
8        <$> const[IV 20] s ->9
9        <0> padsv[$i:1,8] sRM* ->a
-     <@> lineseq vK ->1k
b        <;> nextstate(main 7 junk2.pl:4) v ->c
1j       <2> leaveloop vK/2 ->1k
c           <{> enterloop(next->1f last->1j redo->d) v ->d
-           <@> lineseq vK ->1j
1e             <@> leave vKP ->1f
d                 <0> enter v ->e
e                 <;> nextstate(main 3 junk2.pl:7) v ->f
h                 <2> sassign vKS/2 ->i
f                    <$> const[IV 1] s ->g
g                    <0> padsv[$j:1,8] sRM* ->h
-                 <@> lineseq vK ->10
i                    <;> nextstate(main 3 junk2.pl:5) v ->j
z                    <2> leaveloop vK/2 ->10
j                       <{> enterloop(next->s last->z redo->k) v ->v
-                       <1> null vK/1 ->z
y                          <|> and(other->k) vK/1 ->z
x                             <2> lt sK/2 ->y
v                                <0> padsv[$j:1,8] s ->w
w                                <$> const[IV 20] s ->x
-                             <@> lineseq vK ->-
r                                <@> leave vKP ->s
k                                   <0> enter v ->l
l                                   <;> nextstate(main 2 junk2.pl:6) v
+ ->m
-                                   <1> null vK/1 ->r
p                                      <|> and(other->q) vK/1 ->r
o                                         <2> modulo[t5] sKP/2 ->p
m                                            <0> padsv[$i:1,8] s ->n
n                                            <0> padsv[$j:1,8] s ->o
q                                         <0> last v* ->r
t                                <1> preinc[t4] vK/1 ->u
s                                   <0> padsv[$j:1,8] sRM ->t
u                                <0> unstack v ->v
10                <;> nextstate(main 6 junk2.pl:9) v ->11
-                 <1> null vKP/1 ->1e
14                   <|> and(other->15) vK/1 ->1e
13                      <2> eq sK/2 ->14
11                         <0> padsv[$j:1,8] s ->12
12                         <$> const[IV 20] s ->13
1d                      <@> leave vKP ->1e
15                         <0> enter v ->16
16                         <;> nextstate(main 4 junk2.pl:10) v ->17
1a                         <@> prtf vK ->1b
17                            <0> pushmark s ->18
18                            <$> const[PV "Number: %d\n"] s ->19
19                            <0> padsv[$i:1,8] l ->1a
1b                         <;> nextstate(main 4 junk2.pl:11) v ->1c
1c                         <0> last v* ->1d
1h             <2> add[t3] vKS/2 ->1i
1f                <0> padsv[$i:1,8] sRM ->1g
1g                <$> const[IV 20] s ->1h
1i             <0> unstack v ->d
[download]

And within that, there is one line: o <2> modulo[t5] sKP/2 ->p

which is the equivalent of these four assembly instructions from the C version:

; Line 8
        mov     eax, DWORD PTR _i$[ebp]
        cdq
        idiv    DWORD PTR _j$[ebp]
        test    edx, edx
[download]

Now let's look at the assembler that sits behind that perl modulo instruction:

So the answer to your question is: when you understand why those four instructions in the C version, require those 700+ lines of assembler for the perl version, then you'll understand why the performance difference exists.