I can see four problems with the posted code that would make it run slower:

  1. You are using alternation when a character class would be faster.
  2. You are using capturing parentheses when you don't use the results of those captures.
  3. You are using the "+" Additive operator instead of the more efficient "+=" assignment operator.
  4. You are looping over the same string 28 or 29 times, depending on the value of $ftype, when you probably should only have to loop over the string twice.

For example:

while ($string=~m{(B|C|P|T|Z|a|b|d|h|k|n|p|q|u|v|x)}g){ $count=$count+($tbsz*0.5625); }

Would be more efficient as:

while ($string=~m{[BCPTZabdhknpquvx]}g){ $count+=($tbsz*0.5625); }

That would cover points 1, 2 and 3.    For point 4 you could use hash tables for the calculations, something like:

my %start_table = ( '\s' => 1, '<ems>' => 1, '<195>' => 1, '\.' => 0.25, '<ths>' => 0.25, '<193>' => 0.25, '<ens>' => 0.5, '<194>' => 0.5, ); my $start_lookup = join '|', keys %start_table; my %ftype_table = ( W => 1, '\s' => 1, '%' => 1, ### % is added temporarily for some testing purpose w => 0.84375, '\)' => 0.84375, ### need to escape meta-characters!!! '\(' => 0.84375, M => 0.8125, m => 0.8125, N => 0.7188, Q => 0.7188, # etc, ); my $ftype_lookup = join '', keys %ftype_table; my %non_ftype_table = ( W => 0.7844, w => 0.6999, A => 0.5656, X => 0.55, Q => 0.5469, O => 0.5469, R => 0.5375, K => 0.5375, Y => 0.5375, # etc. ); my $non_ftype_lookup = join '' keys %non_ftype_table; while ( $string =~ /($start_lookup)/og ) { $count += $tbsz * $start_table{ $1 }; } $string =~ s/<[A-Z\[\\\]\^_`a-z]+>//g; if ( $ftype == 1 ) { while ( $string =~ /([$ftype_lookup])/og ) { $count += $tbsz * $ftype_table{ $1 }; } else { while ( $string =~ /([$non_ftype_lookup])/og ) { $count += $tbsz * $non_ftype_table{ $1 }; } }

In reply to Re^3: Which is more faster? While or tr/// by jwkrahn
in thread Which is more faster? While or tr/// by tej

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.