Re: Substitute 'bad words' with 'good words' according to lists

It is not the most efficient way, as you are going through the entire sentence for each key work. Say you have n key word, and the sentence contains m words, you are checking n*m times.

One of the better way is to split the sentence into words, go through the list, and see whether each word exists in the hash (hash search is nothing). This only requires to go through the sentence three times: 1) once to split. 2) once to replace, and 3) if you wish to count this one, to join it back.

use strict;
use warnings;

my %words = ( 
    ugly => 'ug**',
    anotherugly => 'anot*******',
);
my $txt = "ugly anotherugly";
my @words = split / /, $txt; # largely simplified, you have to count ,
+.:; etc
for my $i (0 .. $#words) {
    $words[$i] = $words{$words[$i]} if (exists($words{$words[$i]}))
}
print join(' ', @words);
[download]

Comment on Re: Substitute 'bad words' with 'good words' according to lists Download Code

Replies are listed 'Best First'.
Re^2: Substitute 'bad words' with 'good words' according to lists by sk (Curate) on Sep 26, 2005 at 03:00 UTC
pg, I think the original code is not necessarily inefficient.I feel the performance depends on number of words your split returns.. Here is a benchmark of the original (added `keys` which was missing). I have modifed the txt to be 100 times the original one. Again the story could be different when you have way too many replacements and fewer words. #!/usr/bin/perl use strict; use warnings; use Benchmark qw (:all); my $txt = "ugly anotherugly " x 100; # print $txt,$/; sub pg { my %words = ( ugly => 'ug', anotherugly => 'anot***', ); my @words = split / /, $txt; # largely simplified, you have to cou +nt ,.:; etc for my $i (0 .. $#words) { $words[$i] = $words{$words[$i]} if (exists($words{$words[$ +i]})) } # print join(' ', @words),$/; } sub orig { my %words = ( ugly => 'ug', anotherugly => 'anot*****', ); $txt =~ s/$_/$words{$_}/g foreach keys(%words); # print $txt,$/; } my $test = {'pg' => \&pg, 'Original' =>\&orig,}; my $result = timethese(-10,$test ); cmpthese($result); [download] Output** `Benchmark: running Original, pg for at least 10 CPU seconds... Original: 11 wallclock secs (10.86 usr + 0.00 sys = 10.86 CPU) @ 43 +770.26/s (n=475345) pg: 11 wallclock secs (10.68 usr + 0.00 sys = 10.68 CPU) @ 43 +28.46/s (n=46228) Rate pg Original pg 4328/s -- -90% Original 43770/s 911% --` [download] NOTE: I removed the `join` from your code just to show the looping differences.	[reply] [d/l] [select]
Re^3: Substitute 'bad words' with 'good words' according to lists by pg (Canon) on Sep 26, 2005 at 05:14 UTC
You are right, and thanks for pointing out. My original analysis took the assumption that both s/// and split iterate through the sentence with the same performance, however that was wrong, and split() is much slower: `use strict; use warnings; use Benchmark qw (:all); my $txt = "a" x 100; sub seperate { split //, $txt; } sub replace { $txt =~ s/a/b/g; } my $result = timethese(100000, {'seperate' => \&seperate, 'replace' => + \&replace});` [download] This gives: Benchmark: timing 10000 iterations of replace, seperate... replace: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) (warning: too few iterations for a reliable count) seperate: 2 wallclock secs ( 1.20 usr + 0.00 sys = 1.20 CPU) @ 83 +05.65/s (n =10000) C:\Perl\bin>perl -w math1.pl Benchmark: timing 100000 iterations of replace, seperate... replace: 1 wallclock secs ( 0.03 usr + 0.00 sys = 0.03 CPU) @ 31 +25000.00/s (n=100000) (warning: too few iterations for a reliable count) seperate: 16 wallclock secs (12.50 usr + 0.00 sys = 12.50 CPU) @ 80 +00.00/s (n =100000) [download]	[reply] [d/l] [select]