in reply to Re: Remove double bracket and singe quotes
in thread Remove double bracket and singe quotes
Update: Added more tests. Thank you, wee.
Indeed, tr is fast. I compared the 3 regex statements to tr against a 724 MB string. Testing was done on a 2.6 GHz Core i7 machine with Perl v5.16.2.
use strict; use warnings; use Time::HiRes 'time'; my $doc = "'C-3PO' or 'See-Threepio' is a humanoid robot character fro +m the [[Star Wars]] universe who appears in the original ''Star Wars' +' films, the prequel trilogy and the sequel trilogy.\n"; $doc .= $doc for 1 .. 22; ## expand string to 724 MB print "length : ", length($doc), "\n"; # 759169024 my $start = time; # $doc =~ s/\[\[//g; ## 8.626 secs. # $doc =~ s/\]\]//g; # $doc =~ s/\'//g; # $doc =~ s/\[//g; ## 10.493 secs. # $doc =~ s/\]//g; # $doc =~ s/\'//g; # $doc =~ s/\[+//g; ## 7.050 secs. # $doc =~ s/\]+//g; # $doc =~ s/\'+//g; # $doc =~ s/(?:\[|\]|\')//g; ## 19.559 secs. # $doc =~ s/(?:\[|\]|\')+//g; ## 56.150 secs. <- did not expect this # $doc =~ s/[\[\]\']//g; ## 9.072 secs. # $doc =~ s/[\[\]\']+//g; ## 6.915 secs. $doc =~ tr/[]'//d; ## 1.908 secs. printf "duration : %7.03f secs.\n", time - $start; print "length : ", length($doc), "\n"; # 708837376
It's unfortunate that Perl doesn't know to optimize the following automatically :(
$doc =~ s/(?:\[|\]|\')//g --> $doc =~ s/[\[\]\']//g $doc =~ s/(?:\[|\]|\')+//g --> $doc =~ s/[\[\]\']+//g
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^3: Remove double bracket and singe quotes
by Marshall (Canon) on May 02, 2016 at 20:23 UTC | |
Re^3: Remove double bracket and singe quotes
by mr_mischief (Monsignor) on May 02, 2016 at 21:49 UTC | |
by bart (Canon) on May 03, 2016 at 11:38 UTC | |
by marioroy (Prior) on May 02, 2016 at 22:30 UTC | |
Re^3: Remove double bracket and singe quotes
by SimonPratt (Friar) on May 03, 2016 at 09:54 UTC |