Re^2: Substitute 'bad words' with 'good words' according to lists

pg,

I think the original code is not necessarily inefficient.I feel the performance depends on number of words your split returns.. Here is a benchmark of the original (added keys which was missing). I have modifed the txt to be 100 times the original one.

Again the story could be different when you have way too many replacements and fewer words.

#!/usr/bin/perl

use strict;
use warnings;
use Benchmark qw (:all);

my $txt = "ugly anotherugly " x 100;
# print $txt,$/;

sub pg {
    my %words = (
            ugly => 'ug**',
            anotherugly => 'anot*******',
            );
    my @words = split / /, $txt; # largely simplified, you have to cou
+nt ,.:; etc
        for my $i (0 .. $#words) {
            $words[$i] = $words{$words[$i]} if (exists($words{$words[$
+i]}))
        }
#   print join(' ', @words),$/;
}

sub orig {
    my %words = (
            ugly => 'ug**',
            anotherugly => 'anot*******',
            );
    $txt =~ s/$_/$words{$_}/g foreach keys(%words);
#   print $txt,$/;
}

my $test = {'pg' => \&pg, 'Original' =>\&orig,};

my $result = timethese(-10,$test );
cmpthese($result);
[download]

Output

Benchmark: running Original, pg for at least 10 CPU seconds...
  Original: 11 wallclock secs (10.86 usr +  0.00 sys = 10.86 CPU) @ 43
+770.26/s (n=475345)
        pg: 11 wallclock secs (10.68 usr +  0.00 sys = 10.68 CPU) @ 43
+28.46/s (n=46228)
            Rate       pg Original
pg        4328/s       --     -90%
Original 43770/s     911%       --
[download]

NOTE: I removed the join from your code just to show the looping differences.

Comment on Re^2: Substitute 'bad words' with 'good words' according to lists Select or Download Code

Replies are listed 'Best First'.
Re^3: Substitute 'bad words' with 'good words' according to lists by pg (Canon) on Sep 26, 2005 at 05:14 UTC
You are right, and thanks for pointing out. My original analysis took the assumption that both s/// and split iterate through the sentence with the same performance, however that was wrong, and split() is much slower: `use strict; use warnings; use Benchmark qw (:all); my $txt = "a" x 100; sub seperate { split //, $txt; } sub replace { $txt =~ s/a/b/g; } my $result = timethese(100000, {'seperate' => \&seperate, 'replace' => + \&replace});` [download] This gives: Benchmark: timing 10000 iterations of replace, seperate... replace: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) (warning: too few iterations for a reliable count) seperate: 2 wallclock secs ( 1.20 usr + 0.00 sys = 1.20 CPU) @ 83 +05.65/s (n =10000) C:\Perl\bin>perl -w math1.pl Benchmark: timing 100000 iterations of replace, seperate... replace: 1 wallclock secs ( 0.03 usr + 0.00 sys = 0.03 CPU) @ 31 +25000.00/s (n=100000) (warning: too few iterations for a reliable count) seperate: 16 wallclock secs (12.50 usr + 0.00 sys = 12.50 CPU) @ 80 +00.00/s (n =100000) [download]	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^3: Substitute 'bad words' with 'good words' according to lists
by pg (Canon) on Sep 26, 2005 at 05:14 UTC

You are right, and thanks for pointing out. My original analysis took the assumption that both s/// and split iterate through the sentence with the same performance, however that was wrong, and split() is much slower:

use strict;
use warnings;
use Benchmark qw (:all);

my $txt = "a" x 100;

sub seperate {
    split //, $txt;
}

sub replace {
    $txt =~ s/a/b/g;
}

my $result = timethese(100000, {'seperate' => \&seperate, 'replace' =>
+ \&replace});
[download]

This gives:

Benchmark: timing 10000 iterations of replace, seperate...
   replace:  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
            (warning: too few iterations for a reliable count)
  seperate:  2 wallclock secs ( 1.20 usr +  0.00 sys =  1.20 CPU) @ 83
+05.65/s (n
=10000)

C:\Perl\bin>perl -w math1.pl
Benchmark: timing 100000 iterations of replace, seperate...
   replace:  1 wallclock secs ( 0.03 usr +  0.00 sys =  0.03 CPU) @ 31
+25000.00/s
 (n=100000)
            (warning: too few iterations for a reliable count)
  seperate: 16 wallclock secs (12.50 usr +  0.00 sys = 12.50 CPU) @ 80
+00.00/s (n
=100000)
[download]

[reply]
[d/l]
[select]