Which is more faster? While or tr///

Replies are listed 'Best First'.
Re: Which is more faster? While or tr/// by GrandFather (Saint) on Feb 01, 2011 at 06:05 UTC
I would expect tr/// to be more fasterest. However, if it really matters you should write some code to benchmark the alternatives using Benchmark. The real question though is: "Does it actually matter?". Unless the code fragment has been identified using a profiler as the time hog in code that is running too slowly to achieve some particular goal, write the fragment in the way that seems clearest and easiest to maintain. True laziness is hard work	[reply]
Re^2: Which is more faster? While or tr/// by tej (Scribe) on Feb 01, 2011 at 06:53 UTC
Thank you.. You are rite..I should first check which part of ode is taking more time to run..	[reply]
Re: Which is more faster? While or tr/// by BrowserUk (Patriarch) on Feb 01, 2011 at 06:25 UTC
You're counting the number of 'W's? Or is that meant to be non-word characters: `m[(\W)]g`? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re: Which is more faster? While or tr/// by ikegami (Patriarch) on Feb 01, 2011 at 07:53 UTC
Does this makes my program little slow? You tell us. If you can't tell, does it matter? Here are some clean implementations: `$count += ($tbsz0.7844) while $string =~ /W/g;` [download] `$count += ($tbsz0.7844) * ( () = $string =~ /W/g );` [download] `$count += ($tbsz0.7844) ( $string =~ tr/W// );` [download]	[reply] [d/l] [select]
Re: Which is more faster? While or tr/// by JavaFan (Canon) on Feb 01, 2011 at 07:39 UTC
I have to calculate number of all letters in files and accordingly increase count `while ($string=~m{(W)}g){` [download] Your code does not match your description. Your code counts the number of Ws.	[reply] [d/l]
Re^2: Which is more faster? While or tr/// by tej (Scribe) on Feb 01, 2011 at 09:37 UTC
This was just example... Whole code looks like while ($string=~m{<(ths\|193)>\|\.}g) { $count=$count+($tbsz * 0.25); } while ($string=~m{<(ens\|194)>}g) { $count=$count+($tbsz * 0.5); } while ($string=~m{<(ems\|195)>\|\s}g) { $count=$count+$tbsz; } $string=~s/<[A-z]+>//g; if($ftype==1){ while ($string=~m{(W\|\s\|%)}g){ ### % is added temporarily + for some testing purpose $count=$count+($tbsz1); } while ($string=~m{(w\|\)\|\()}g){ $count=$count+($tbsz0.84375); } while ($string=~m{(M\|m)}g){ $count=$count+($tbsz0.8125); } while ($string=~m{(N\|Q)}g){ $count=$count+($tbsz0.7188); } while ($string=~m{(O\|Y)}g){ $count=$count+($tbsz0.6875); } while ($string=~m{(A\|D\|G\|H\|K\|U\|V\|X)}g){ $count=$count+($tbsz0.6562); } while ($string=~m{(R)}g){ $count=$count+($tbsz0.625); } while ($string=~m{(B\|C\|P\|T\|Z\|a\|b\|d\|h\|k\|n\|p\|q\|u\|v\|x)}g){ $count=$count+($tbsz0.5625); } while ($string=~m{(6)}g){ $count=$count+($tbsz0.55); } while ($string=~m{(0)}g){ $count=$count+($tbsz0.5375); } while ($string=~m{(g\|y)}g){ $count=$count+($tbsz0.5313); } while ($string=~m{(4)}g){ $count=$count+($tbsz0.5281); } while ($string=~m{(7\|8)}g){ $count=$count+($tbsz0.5156); } while ($string=~m{(o\|2\|3)}g){ $count=$count+($tbsz0.5); } while ($string=~m{(5)}g){ $count=$count+($tbsz0.4938); } while ($string=~m{(9)}g){ $count=$count+($tbsz0.4813); } while ($string=~m{(E\|L)}g){ $count=$count+($tbsz0.46875); } while ($string=~m{(F\|c\|e\|z)}g){ $count=$count+($tbsz0.4375); } while ($string=~m{(J\|S\|f)}g){ $count=$count+($tbsz0.4063); } while ($string=~m{(1)}g){ $count=$count+($tbsz0.3625); } while ($string=~m{(r)}g){ $count=$count+($tbsz0.35); } while ($string=~m{(s)}g){ $count=$count+($tbsz0.3188); } while ($string=~m{(l\|t)}g){ $count=$count+($tbsz0.285); } while ($string=~m{(l)}g){ $count=$count+($tbsz0.25); } while ($string=~m{(i\|j)}g){ $count=$count+($tbsz0.2345); } }else{ while ($string=~m{(W)}g){ $count=$count+($tbsz0.7844); } while ($string=~m{(w)}g){ $count=$count+($tbsz0.6989); } while ($string=~m{(A)}g){ $count=$count+($tbsz0.5656); } while ($string=~m{(X)}g){ $count=$count+($tbsz0.55); } while ($string=~m{(Q\|O)}g){ $count=$count+($tbsz0.5469); } while ($string=~m{(R\|K\|Y)}g){ $count=$count+($tbsz0.5375); } while ($string=~m{(C\|V)}g){ $count=$count+($tbsz0.5313); } while ($string=~m{(N)}g){ $count=$count+($tbsz0.5283); } while ($string=~m{(D\|G\|T)}g){ $count=$count+($tbsz0.525); } while ($string=~m{(S\|H)}g){ $count=$count+($tbsz0.5125); } while ($string=~m{(B)}g){ $count=$count+($tbsz0.5); } while ($string=~m{(4\|U\|Z)}g){ $count=$count+($tbsz0.4875); } while ($string=~m{(8\|9\|P\|3\|6\|7)}g){ $count=$count+($tbsz0.475); } while ($string=~m{(0\|5\|a\|2)}g){ $count=$count+($tbsz0.4688); } while ($string=~m{(x\|y)}g){ $count=$count+($tbsz0.4594); } while ($string=~m{(L\|b\|g\|o\|p\|q\|v)}g){ $count=$count+($tbsz0.4469); } while ($string=~m{(E\|F\|c\|d\|e)}g){ $count=$count+($tbsz0.4438); } while ($string=~m{(h)}g){ $count=$count+($tbsz0.4313); } while ($string=~m{(n\|u)}g){ $count=$count+($tbsz0.4219); } while ($string=~m{(z\|J\|k)}g){ $count=$count+($tbsz0.4031); } while ($string=~m{(s\|r)}g){ $count=$count+($tbsz0.3969); } while ($string=~m{(t)}g){ $count=$count+($tbsz0.3219); } while ($string=~m{(f)}g){ $count=$count+($tbsz0.3188); } while ($string=~m{(1)}g){ $count=$count+($tbsz0.3031); } while ($string=~m{(j)}g){ $count=$count+($tbsz0.2438); } while ($string=~m{(I\|i\|l)}g){ $count=$count+($tbsz*0.1438); } } [download]	[reply] [d/l]
Re^3: Which is more faster? While or tr/// by jwkrahn (Abbot) on Feb 01, 2011 at 11:05 UTC
I can see four problems with the posted code that would make it run slower: You are using alternation when a character class would be faster. You are using capturing parentheses when you don't use the results of those captures. You are using the "+" Additive operator instead of the more efficient "+=" assignment operator. You are looping over the same string 28 or 29 times, depending on the value of $ftype, when you probably should only have to loop over the string twice. For example: `while ($string=~m{(B\|C\|P\|T\|Z\|a\|b\|d\|h\|k\|n\|p\|q\|u\|v\|x)}g){ $count=$count+($tbsz0.5625); }` [download] Would be more efficient as: `while ($string=~m{[BCPTZabdhknpquvx]}g){ $count+=($tbsz0.5625); }` [download] That would cover points 1, 2 and 3. For point 4 you could use hash tables for the calculations, something like: my %start_table = ( '\s' => 1, '<ems>' => 1, '<195>' => 1, '\.' => 0.25, '<ths>' => 0.25, '<193>' => 0.25, '<ens>' => 0.5, '<194>' => 0.5, ); my $start_lookup = join '\|', keys %start_table; my %ftype_table = ( W => 1, '\s' => 1, '%' => 1, ### % is added temporarily for some testing purpose w => 0.84375, '\)' => 0.84375, ### need to escape meta-characters!!! '\(' => 0.84375, M => 0.8125, m => 0.8125, N => 0.7188, Q => 0.7188, # etc, ); my $ftype_lookup = join '', keys %ftype_table; my %non_ftype_table = ( W => 0.7844, w => 0.6999, A => 0.5656, X => 0.55, Q => 0.5469, O => 0.5469, R => 0.5375, K => 0.5375, Y => 0.5375, # etc. ); my $non_ftype_lookup = join '' keys %non_ftype_table; while ( $string =~ /($start_lookup)/og ) { $count += $tbsz * $start_table{ $1 }; } $string =~ s/<[A-Z\[\\\]\^_`a-z]+>//g; if ( $ftype == 1 ) { while ( $string =~ /([$ftype_lookup])/og ) { $count += $tbsz * $ftype_table{ $1 }; } else { while ( $string =~ /([$non_ftype_lookup])/og ) { $count += $tbsz * $non_ftype_table{ $1 }; } } [download]	[reply] [d/l] [select]
Re^3: Which is more faster? While or tr/// by Anonyrnous Monk (Hermit) on Feb 01, 2011 at 10:19 UTC
Couldn't you just put all the weights in a lookup table, and then iterate once over the characters of the string? Something like this: `my %weight = ( A => 0.6562, B => 0.3571, #... z => 0.42, ); for my $ch (split //, $string) { $count += $tbsz * $weight{$ch}; }` [download] Or (if the string is huge) `... while ($string =~ /(.)/gs) { $count += $tbsz * $weight{$1}; }` [download] (And if `ord($ch)` of the characters is within a narrow range (such as ASCII), you could also use an array, and store the weights under `$array[ord($ch)]` — which might be a tad faster than a hash.)	[reply] [d/l] [select]
Re^3: Which is more faster? While or tr/// by JavaFan (Canon) on Feb 01, 2011 at 10:37 UTC
Whole code looks like `while ($string=~m{<(ths\|193)>\|\.}g) { $count=$count+($tbsz * 0.25); }` [download] That makes a tr/// solution a non-candidate, doesn't it? I'd probably go for something like: `my %factor; $factor{ths} = $factor{193} = 0.25; $factor{ems} = $factor{194} = 0.5; $factor{ens} = $factor{195} = 1; $factor{W} = $factor{' '} = $factor{"\n"} = $factor{"\t"} = ... = 1; $factor{w} = $factor{'('} = $factor{')'} = 0.84375; .... my $count; while (/(?\|<([A-Za-z0-9]+)>\|(.))/g) { no warnings 'uninitialized'; $count += $factor{$1}; } $count *= $tbsz;` [download] You may want to consider replacing the `(.)` with `($charclass)`, where: `my $charclass = join "", grep {1 == length} keys %factor;` [download] Whether that makes a difference depends on your data set.	[reply] [d/l] [select]

I would expect tr/// to be more fasterest. However, if it really matters you should write some code to benchmark the alternatives using Benchmark.

The real question though is: "Does it actually matter?". Unless the code fragment has been identified using a profiler as the time hog in code that is running too slowly to achieve some particular goal, write the fragment in the way that seems clearest and easiest to maintain.

True laziness is hard work

[reply]

Thank you.. You are rite..I should first check which part of ode is taking more time to run..

[reply]

You're counting the number of 'W's? Or is that meant to be non-word characters: m[(\W)]g?

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

[reply]
[d/l]

Does this makes my program little slow?

You tell us. If you can't tell, does it matter?

Here are some clean implementations:

$count += ($tbsz*0.7844) while $string =~ /W/g;
[download]

$count += ($tbsz*0.7844) * ( () = $string =~ /W/g );
[download]

$count += ($tbsz*0.7844) * ( $string =~ tr/W// );
[download]

[reply]
[d/l]
[select]

I have to calculate number of all letters in files and accordingly increase count
while ($string=~m{(W)}g){
[download]

[reply]
[d/l]

while ($string=~m{<(ths|193)>|\.}g) {
            $count=$count+($tbsz * 0.25);
        }
        while ($string=~m{<(ens|194)>}g) {
            $count=$count+($tbsz * 0.5);
        }
        while ($string=~m{<(ems|195)>|\s}g) {
            $count=$count+$tbsz;
        }
        
        $string=~s/<[A-z]+>//g;
        if($ftype==1){
            while ($string=~m{(W|\s|%)}g){  ### % is added temporarily
+ for some testing purpose
                $count=$count+($tbsz*1);
            }
            while ($string=~m{(w|\)|\()}g){
                $count=$count+($tbsz*0.84375);
            }
            while ($string=~m{(M|m)}g){
                $count=$count+($tbsz*0.8125);
            }
            while ($string=~m{(N|Q)}g){
                $count=$count+($tbsz*0.7188);
            }
            while ($string=~m{(O|Y)}g){
                $count=$count+($tbsz*0.6875);
            }
            while ($string=~m{(A|D|G|H|K|U|V|X)}g){
                $count=$count+($tbsz*0.6562);
            }
            while ($string=~m{(R)}g){
                $count=$count+($tbsz*0.625);
            }
            while ($string=~m{(B|C|P|T|Z|a|b|d|h|k|n|p|q|u|v|x)}g){
                $count=$count+($tbsz*0.5625);
            }
            while ($string=~m{(6)}g){
                $count=$count+($tbsz*0.55);
            }
            while ($string=~m{(0)}g){
                $count=$count+($tbsz*0.5375);
            }
            while ($string=~m{(g|y)}g){
                $count=$count+($tbsz*0.5313);
            }
            while ($string=~m{(4)}g){
                $count=$count+($tbsz*0.5281);
            }
            while ($string=~m{(7|8)}g){
                $count=$count+($tbsz*0.5156);
            }
            while ($string=~m{(o|2|3)}g){
                $count=$count+($tbsz*0.5);
            }
            while ($string=~m{(5)}g){
                $count=$count+($tbsz*0.4938);
            }
            while ($string=~m{(9)}g){
                $count=$count+($tbsz*0.4813);
            }
            while ($string=~m{(E|L)}g){
                $count=$count+($tbsz*0.46875);
            }
            while ($string=~m{(F|c|e|z)}g){
                $count=$count+($tbsz*0.4375);
            }
            while ($string=~m{(J|S|f)}g){
                $count=$count+($tbsz*0.4063);
            }
            while ($string=~m{(1)}g){
                $count=$count+($tbsz*0.3625);
            }
            while ($string=~m{(r)}g){
                $count=$count+($tbsz*0.35);
            }
            while ($string=~m{(s)}g){
                $count=$count+($tbsz*0.3188);
            }
            while ($string=~m{(l|t)}g){
                $count=$count+($tbsz*0.285);
            }
            while ($string=~m{(l)}g){
                $count=$count+($tbsz*0.25);
            }
            while ($string=~m{(i|j)}g){
                $count=$count+($tbsz*0.2345);
            }
        }else{
            while ($string=~m{(W)}g){
                $count=$count+($tbsz*0.7844);
            }
            while ($string=~m{(w)}g){
                $count=$count+($tbsz*0.6989);
            }
            while ($string=~m{(A)}g){
                $count=$count+($tbsz*0.5656);
            }
            while ($string=~m{(X)}g){
                $count=$count+($tbsz*0.55);
            }
            while ($string=~m{(Q|O)}g){
                $count=$count+($tbsz*0.5469);
            }
            while ($string=~m{(R|K|Y)}g){
                $count=$count+($tbsz*0.5375);
            }
            while ($string=~m{(C|V)}g){
                $count=$count+($tbsz*0.5313);
            }
            while ($string=~m{(N)}g){
                $count=$count+($tbsz*0.5283);
            }
            while ($string=~m{(D|G|T)}g){
                $count=$count+($tbsz*0.525);
            }
            while ($string=~m{(S|H)}g){
                $count=$count+($tbsz*0.5125);
            }
            while ($string=~m{(B)}g){
                $count=$count+($tbsz*0.5);
            }
            while ($string=~m{(4|U|Z)}g){
                $count=$count+($tbsz*0.4875);
            }
            while ($string=~m{(8|9|P|3|6|7)}g){
                $count=$count+($tbsz*0.475);
            }
            while ($string=~m{(0|5|a|2)}g){
                $count=$count+($tbsz*0.4688);
            }
            while ($string=~m{(x|y)}g){
                $count=$count+($tbsz*0.4594);
            }
            while ($string=~m{(L|b|g|o|p|q|v)}g){
                $count=$count+($tbsz*0.4469);
            }
            while ($string=~m{(E|F|c|d|e)}g){
                $count=$count+($tbsz*0.4438);
            }
            while ($string=~m{(h)}g){
                $count=$count+($tbsz*0.4313);
            }
            while ($string=~m{(n|u)}g){
                $count=$count+($tbsz*0.4219);
            }
            while ($string=~m{(z|J|k)}g){
                $count=$count+($tbsz*0.4031);
            }
            while ($string=~m{(s|r)}g){
                $count=$count+($tbsz*0.3969);
            }
            while ($string=~m{(t)}g){
                $count=$count+($tbsz*0.3219);
            }
            while ($string=~m{(f)}g){
                $count=$count+($tbsz*0.3188);
            }
            while ($string=~m{(1)}g){
                $count=$count+($tbsz*0.3031);
            }
            while ($string=~m{(j)}g){
                $count=$count+($tbsz*0.2438);
            }
            while ($string=~m{(I|i|l)}g){
                $count=$count+($tbsz*0.1438);
            }
        }
[download]

[reply]
[d/l]

I can see four problems with the posted code that would make it run slower:

You are using alternation when a character class would be faster.
You are using capturing parentheses when you don't use the results of those captures.
You are using the "+" Additive operator instead of the more efficient "+=" assignment operator.
You are looping over the same string 28 or 29 times, depending on the value of $ftype, when you probably should only have to loop over the string twice.

For example:

    while ($string=~m{(B|C|P|T|Z|a|b|d|h|k|n|p|q|u|v|x)}g){
        $count=$count+($tbsz*0.5625);
    }
[download]

Would be more efficient as:

    while ($string=~m{[BCPTZabdhknpquvx]}g){
        $count+=($tbsz*0.5625);
    }
[download]

That would cover points 1, 2 and 3. For point 4 you could use hash tables for the calculations, something like:


my %start_table = (
    '\s'    => 1,
    '<ems>' => 1,
    '<195>' => 1,
    '\.'    => 0.25,
    '<ths>' => 0.25,
    '<193>' => 0.25,
    '<ens>' => 0.5,
    '<194>' => 0.5,
    );
my $start_lookup = join '|', keys %start_table;

my %ftype_table = (
    W    => 1,
    '\s' => 1,
    '%'  => 1,    ### % is added temporarily for some testing purpose
    w    => 0.84375,
    '\)' => 0.84375,    ### need to escape meta-characters!!!
    '\(' => 0.84375,
    M    => 0.8125,
    m    => 0.8125,
    N    => 0.7188,
    Q    => 0.7188,
    # etc,
    );
my $ftype_lookup = join '', keys %ftype_table;

my %non_ftype_table = (
    W    => 0.7844,
    w    => 0.6999,
    A    => 0.5656,
    X    => 0.55,
    Q    => 0.5469,
    O    => 0.5469,
    R    => 0.5375,
    K    => 0.5375,
    Y    => 0.5375,
    # etc.
    );
my $non_ftype_lookup = join '' keys %non_ftype_table;


while ( $string =~ /($start_lookup)/og ) {
    $count += $tbsz * $start_table{ $1 };
    }
$string =~ s/<[A-Z\[\\\]\^_`a-z]+>//g;

if ( $ftype == 1 ) {
    while ( $string =~ /([$ftype_lookup])/og ) {
        $count += $tbsz * $ftype_table{ $1 };
        }
else {
    while ( $string =~ /([$non_ftype_lookup])/og ) {
        $count += $tbsz * $non_ftype_table{ $1 };
        }
    }
[download]

[reply]
[d/l]
[select]

Couldn't you just put all the weights in a lookup table, and then iterate once over the characters of the string? Something like this:

my %weight = (
    A => 0.6562,
    B => 0.3571,
    #...          
    z => 0.42,
);

for my $ch (split //, $string) {
    $count += $tbsz * $weight{$ch};
}
[download]

Or (if the string is huge)

...
while ($string =~ /(.)/gs) {
    $count += $tbsz * $weight{$1};
}
[download]

(And if ord($ch) of the characters is within a narrow range (such as ASCII), you could also use an array, and store the weights under $array[ord($ch)] — which might be a tad faster than a hash.)

[reply]
[d/l]
[select]

Whole code looks like

while ($string=~m{<(ths|193)>|\.}g) {
            $count=$count+($tbsz * 0.25);
        }
[download]

I'd probably go for something like:

my %factor;
$factor{ths} = $factor{193} = 0.25;
$factor{ems} = $factor{194} = 0.5;
$factor{ens} = $factor{195} = 1;
$factor{W} = $factor{' '} = $factor{"\n"} = $factor{"\t"} = ... = 1;
$factor{w} = $factor{'('} = $factor{')'} = 0.84375;
....
my $count;
while (/(?|<([A-Za-z0-9]+)>|(.))/g) {
    no warnings 'uninitialized';
    $count += $factor{$1};
}
$count *= $tbsz;
[download]

(.)

($charclass)

my $charclass = join "", grep {1 == length} keys %factor;
[download]

[reply]
[d/l]
[select]