in reply to Substitute 'bad words' with 'good words' according to lists

It is not the most efficient way, as you are going through the entire sentence for each key work. Say you have n key word, and the sentence contains m words, you are checking n*m times.

One of the better way is to split the sentence into words, go through the list, and see whether each word exists in the hash (hash search is nothing). This only requires to go through the sentence three times: 1) once to split. 2) once to replace, and 3) if you wish to count this one, to join it back.

use strict; use warnings; my %words = ( ugly => 'ug**', anotherugly => 'anot*******', ); my $txt = "ugly anotherugly"; my @words = split / /, $txt; # largely simplified, you have to count , +.:; etc for my $i (0 .. $#words) { $words[$i] = $words{$words[$i]} if (exists($words{$words[$i]})) } print join(' ', @words);

Replies are listed 'Best First'.
Re^2: Substitute 'bad words' with 'good words' according to lists
by sk (Curate) on Sep 26, 2005 at 03:00 UTC
    pg,

    I think the original code is not necessarily inefficient.I feel the performance depends on number of words your split returns.. Here is a benchmark of the original (added  keys which was missing). I have modifed the txt to be 100 times the original one.

    Again the story could be different when you have way too many replacements and fewer words.

    #!/usr/bin/perl use strict; use warnings; use Benchmark qw (:all); my $txt = "ugly anotherugly " x 100; # print $txt,$/; sub pg { my %words = ( ugly => 'ug**', anotherugly => 'anot*******', ); my @words = split / /, $txt; # largely simplified, you have to cou +nt ,.:; etc for my $i (0 .. $#words) { $words[$i] = $words{$words[$i]} if (exists($words{$words[$ +i]})) } # print join(' ', @words),$/; } sub orig { my %words = ( ugly => 'ug**', anotherugly => 'anot*******', ); $txt =~ s/$_/$words{$_}/g foreach keys(%words); # print $txt,$/; } my $test = {'pg' => \&pg, 'Original' =>\&orig,}; my $result = timethese(-10,$test ); cmpthese($result);

    Output

    Benchmark: running Original, pg for at least 10 CPU seconds... Original: 11 wallclock secs (10.86 usr + 0.00 sys = 10.86 CPU) @ 43 +770.26/s (n=475345) pg: 11 wallclock secs (10.68 usr + 0.00 sys = 10.68 CPU) @ 43 +28.46/s (n=46228) Rate pg Original pg 4328/s -- -90% Original 43770/s 911% --

    NOTE: I removed the join from your code just to show the looping differences.

      You are right, and thanks for pointing out. My original analysis took the assumption that both s/// and split iterate through the sentence with the same performance, however that was wrong, and split() is much slower:

      use strict; use warnings; use Benchmark qw (:all); my $txt = "a" x 100; sub seperate { split //, $txt; } sub replace { $txt =~ s/a/b/g; } my $result = timethese(100000, {'seperate' => \&seperate, 'replace' => + \&replace});

      This gives:

      Benchmark: timing 10000 iterations of replace, seperate... replace: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) (warning: too few iterations for a reliable count) seperate: 2 wallclock secs ( 1.20 usr + 0.00 sys = 1.20 CPU) @ 83 +05.65/s (n =10000) C:\Perl\bin>perl -w math1.pl Benchmark: timing 100000 iterations of replace, seperate... replace: 1 wallclock secs ( 0.03 usr + 0.00 sys = 0.03 CPU) @ 31 +25000.00/s (n=100000) (warning: too few iterations for a reliable count) seperate: 16 wallclock secs (12.50 usr + 0.00 sys = 12.50 CPU) @ 80 +00.00/s (n =100000)