Re^3: Improve processing time for string substitutions

Well then, it's something like below that might fit.

#!/usr/bin/perl

use strict;
use warnings;

use Benchmark;

my %entities = ();

my $counter = 10000;

while ($counter-- > 0) {
   $entities{ '&ent' . sprintf( '%05d', $counter ) . ';' } = $counter;
}

my $text_to_be_changed = <<EOT;
This is some text containing five (&ent00005;) entities that
will be changed:
&ent00029;, &ent00129;, &ent00229;, &ent00329; and &ent00429;

The pseudo_entities below should rest unchanged:
\&dent00029;, \&dont_change;, \& qwerty ;, 12345;.

EOT

timethese( 1000, {
               'with_splitting' => \&with_splitting,
               'regexish'       => \&regexish,

           });

exit;

sub with_splitting {
    my @modified_parts = ();

    my @split_on_semicolon = split( /;/, $text_to_be_changed);

    foreach my $ending_in_semicolon (@split_on_semicolon) {
        if ( $ending_in_semicolon =~ m/(&\w+)$/
            and exists( $entities{ "$1;"} ) ) {
            $ending_in_semicolon =~ s/(&\w+)$/$entities{ "$1;" }/;
        }
        else {
            $ending_in_semicolon .= ';' ;
        }
        push( @modified_parts, $ending_in_semicolon );
    }
    my $result = join( '', @modified_parts );
    #print "RESULT: \n", $result;
}


sub regexish {
    my $huge_entity = join '|', keys %entities;
    $text_to_be_changed =~ s/($huge_entity)/$entities{$1}/g;

    #print "RES2: \n", $text_to_be_changed;
}
[download]

The results are interesting:

Benchmark: timing 1000 iterations of regexish, with_splitting...
  regexish: 59 wallclock secs (58.16 usr +  0.02 sys = 58.18 CPU) @ 17
+.19/s (n=1000)
with_splitting:  0 wallclock secs ( 0.02 usr +  0.00 sys =  0.02 CPU) 
+@ 50000.00/s (n=1000)
            (warning: too few iterations for a reliable count)
[download]

Comment on Re^3: Improve processing time for string substitutions Select or Download Code