tej has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I have to calculate number of all letters in files and accordingly increase count

I am using while loop for this

while ($string=~m{(W)}g){ $count=$count+($tbsz*0.7844); }

Does this makes my program little slow?... In such cases which one would be more faster tr/// or while?

Replies are listed 'Best First'.
Re: Which is more faster? While or tr///
by GrandFather (Saint) on Feb 01, 2011 at 06:05 UTC

    I would expect tr/// to be more fasterest. However, if it really matters you should write some code to benchmark the alternatives using Benchmark.

    The real question though is: "Does it actually matter?". Unless the code fragment has been identified using a profiler as the time hog in code that is running too slowly to achieve some particular goal, write the fragment in the way that seems clearest and easiest to maintain.

    True laziness is hard work
      Thank you.. You are rite..I should first check which part of ode is taking more time to run..
Re: Which is more faster? While or tr///
by BrowserUk (Patriarch) on Feb 01, 2011 at 06:25 UTC

    You're counting the number of 'W's? Or is that meant to be non-word characters: m[(\W)]g?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Which is more faster? While or tr///
by ikegami (Patriarch) on Feb 01, 2011 at 07:53 UTC

    Does this makes my program little slow?

    You tell us. If you can't tell, does it matter?

    Here are some clean implementations:

    $count += ($tbsz*0.7844) while $string =~ /W/g;
    $count += ($tbsz*0.7844) * ( () = $string =~ /W/g );
    $count += ($tbsz*0.7844) * ( $string =~ tr/W// );
Re: Which is more faster? While or tr///
by JavaFan (Canon) on Feb 01, 2011 at 07:39 UTC
    I have to calculate number of all letters in files and accordingly increase count
    while ($string=~m{(W)}g){
    Your code does not match your description. Your code counts the number of Ws.
      This was just example... Whole code looks like
      while ($string=~m{<(ths|193)>|\.}g) { $count=$count+($tbsz * 0.25); } while ($string=~m{<(ens|194)>}g) { $count=$count+($tbsz * 0.5); } while ($string=~m{<(ems|195)>|\s}g) { $count=$count+$tbsz; } $string=~s/<[A-z]+>//g; if($ftype==1){ while ($string=~m{(W|\s|%)}g){ ### % is added temporarily + for some testing purpose $count=$count+($tbsz*1); } while ($string=~m{(w|\)|\()}g){ $count=$count+($tbsz*0.84375); } while ($string=~m{(M|m)}g){ $count=$count+($tbsz*0.8125); } while ($string=~m{(N|Q)}g){ $count=$count+($tbsz*0.7188); } while ($string=~m{(O|Y)}g){ $count=$count+($tbsz*0.6875); } while ($string=~m{(A|D|G|H|K|U|V|X)}g){ $count=$count+($tbsz*0.6562); } while ($string=~m{(R)}g){ $count=$count+($tbsz*0.625); } while ($string=~m{(B|C|P|T|Z|a|b|d|h|k|n|p|q|u|v|x)}g){ $count=$count+($tbsz*0.5625); } while ($string=~m{(6)}g){ $count=$count+($tbsz*0.55); } while ($string=~m{(0)}g){ $count=$count+($tbsz*0.5375); } while ($string=~m{(g|y)}g){ $count=$count+($tbsz*0.5313); } while ($string=~m{(4)}g){ $count=$count+($tbsz*0.5281); } while ($string=~m{(7|8)}g){ $count=$count+($tbsz*0.5156); } while ($string=~m{(o|2|3)}g){ $count=$count+($tbsz*0.5); } while ($string=~m{(5)}g){ $count=$count+($tbsz*0.4938); } while ($string=~m{(9)}g){ $count=$count+($tbsz*0.4813); } while ($string=~m{(E|L)}g){ $count=$count+($tbsz*0.46875); } while ($string=~m{(F|c|e|z)}g){ $count=$count+($tbsz*0.4375); } while ($string=~m{(J|S|f)}g){ $count=$count+($tbsz*0.4063); } while ($string=~m{(1)}g){ $count=$count+($tbsz*0.3625); } while ($string=~m{(r)}g){ $count=$count+($tbsz*0.35); } while ($string=~m{(s)}g){ $count=$count+($tbsz*0.3188); } while ($string=~m{(l|t)}g){ $count=$count+($tbsz*0.285); } while ($string=~m{(l)}g){ $count=$count+($tbsz*0.25); } while ($string=~m{(i|j)}g){ $count=$count+($tbsz*0.2345); } }else{ while ($string=~m{(W)}g){ $count=$count+($tbsz*0.7844); } while ($string=~m{(w)}g){ $count=$count+($tbsz*0.6989); } while ($string=~m{(A)}g){ $count=$count+($tbsz*0.5656); } while ($string=~m{(X)}g){ $count=$count+($tbsz*0.55); } while ($string=~m{(Q|O)}g){ $count=$count+($tbsz*0.5469); } while ($string=~m{(R|K|Y)}g){ $count=$count+($tbsz*0.5375); } while ($string=~m{(C|V)}g){ $count=$count+($tbsz*0.5313); } while ($string=~m{(N)}g){ $count=$count+($tbsz*0.5283); } while ($string=~m{(D|G|T)}g){ $count=$count+($tbsz*0.525); } while ($string=~m{(S|H)}g){ $count=$count+($tbsz*0.5125); } while ($string=~m{(B)}g){ $count=$count+($tbsz*0.5); } while ($string=~m{(4|U|Z)}g){ $count=$count+($tbsz*0.4875); } while ($string=~m{(8|9|P|3|6|7)}g){ $count=$count+($tbsz*0.475); } while ($string=~m{(0|5|a|2)}g){ $count=$count+($tbsz*0.4688); } while ($string=~m{(x|y)}g){ $count=$count+($tbsz*0.4594); } while ($string=~m{(L|b|g|o|p|q|v)}g){ $count=$count+($tbsz*0.4469); } while ($string=~m{(E|F|c|d|e)}g){ $count=$count+($tbsz*0.4438); } while ($string=~m{(h)}g){ $count=$count+($tbsz*0.4313); } while ($string=~m{(n|u)}g){ $count=$count+($tbsz*0.4219); } while ($string=~m{(z|J|k)}g){ $count=$count+($tbsz*0.4031); } while ($string=~m{(s|r)}g){ $count=$count+($tbsz*0.3969); } while ($string=~m{(t)}g){ $count=$count+($tbsz*0.3219); } while ($string=~m{(f)}g){ $count=$count+($tbsz*0.3188); } while ($string=~m{(1)}g){ $count=$count+($tbsz*0.3031); } while ($string=~m{(j)}g){ $count=$count+($tbsz*0.2438); } while ($string=~m{(I|i|l)}g){ $count=$count+($tbsz*0.1438); } }

        I can see four problems with the posted code that would make it run slower:

        1. You are using alternation when a character class would be faster.
        2. You are using capturing parentheses when you don't use the results of those captures.
        3. You are using the "+" Additive operator instead of the more efficient "+=" assignment operator.
        4. You are looping over the same string 28 or 29 times, depending on the value of $ftype, when you probably should only have to loop over the string twice.

        For example:

        while ($string=~m{(B|C|P|T|Z|a|b|d|h|k|n|p|q|u|v|x)}g){ $count=$count+($tbsz*0.5625); }

        Would be more efficient as:

        while ($string=~m{[BCPTZabdhknpquvx]}g){ $count+=($tbsz*0.5625); }

        That would cover points 1, 2 and 3.    For point 4 you could use hash tables for the calculations, something like:

        my %start_table = ( '\s' => 1, '<ems>' => 1, '<195>' => 1, '\.' => 0.25, '<ths>' => 0.25, '<193>' => 0.25, '<ens>' => 0.5, '<194>' => 0.5, ); my $start_lookup = join '|', keys %start_table; my %ftype_table = ( W => 1, '\s' => 1, '%' => 1, ### % is added temporarily for some testing purpose w => 0.84375, '\)' => 0.84375, ### need to escape meta-characters!!! '\(' => 0.84375, M => 0.8125, m => 0.8125, N => 0.7188, Q => 0.7188, # etc, ); my $ftype_lookup = join '', keys %ftype_table; my %non_ftype_table = ( W => 0.7844, w => 0.6999, A => 0.5656, X => 0.55, Q => 0.5469, O => 0.5469, R => 0.5375, K => 0.5375, Y => 0.5375, # etc. ); my $non_ftype_lookup = join '' keys %non_ftype_table; while ( $string =~ /($start_lookup)/og ) { $count += $tbsz * $start_table{ $1 }; } $string =~ s/<[A-Z\[\\\]\^_`a-z]+>//g; if ( $ftype == 1 ) { while ( $string =~ /([$ftype_lookup])/og ) { $count += $tbsz * $ftype_table{ $1 }; } else { while ( $string =~ /([$non_ftype_lookup])/og ) { $count += $tbsz * $non_ftype_table{ $1 }; } }

        Couldn't you just put all the weights in a lookup table, and then iterate once over the characters of the string?  Something like this:

        my %weight = ( A => 0.6562, B => 0.3571, #... z => 0.42, ); for my $ch (split //, $string) { $count += $tbsz * $weight{$ch}; }

        Or (if the string is huge)

        ... while ($string =~ /(.)/gs) { $count += $tbsz * $weight{$1}; }

        (And if ord($ch) of the characters is within a narrow range (such as ASCII), you could also use an array, and store the weights under $array[ord($ch)] — which might be a tad faster than a hash.)

        Whole code looks like
        while ($string=~m{<(ths|193)>|\.}g) { $count=$count+($tbsz * 0.25); }
        That makes a tr/// solution a non-candidate, doesn't it?

        I'd probably go for something like:

        my %factor; $factor{ths} = $factor{193} = 0.25; $factor{ems} = $factor{194} = 0.5; $factor{ens} = $factor{195} = 1; $factor{W} = $factor{' '} = $factor{"\n"} = $factor{"\t"} = ... = 1; $factor{w} = $factor{'('} = $factor{')'} = 0.84375; .... my $count; while (/(?|<([A-Za-z0-9]+)>|(.))/g) { no warnings 'uninitialized'; $count += $factor{$1}; } $count *= $tbsz;
        You may want to consider replacing the (.) with ($charclass), where:
        my $charclass = join "", grep {1 == length} keys %factor;
        Whether that makes a difference depends on your data set.