Substr versus Regexp

monkfan has asked for the wisdom of the Perl Monks concerning the following question:

Dear friends,

After a couple of months dwelling into Perl scripting especially in manipulating strings, I found myself resorting to use "substr" function a lot.

I had a feeling that the most of the "substr" function can be replaced with regexp in any cases. For example the simple code below. I wonder how would masters redo this function with pure reqexp approach.

sub hamming_distance_string{
        #String length is assumed to be equal
        my ($a,$b) = @_;
        my $len = length ($a);
        my $num_match=0;

        for (my $i=0; $i<$len; $i++) {
             ++$num_match if substr($a, $i, 1) eq substr($b, $i, 1);
        }

        return $num_match;
}
[download]

Is it true that 'substr' function can be rewritten with regexp all the time? If so when is it better to use which? Any rule of thumb?

Hope to hear from you again. Thanks so much for your time.
And wishing you all a very Happy and Prosperous New Year 2005!

Regards,
Edward

Comment on Substr versus Regexp Download Code

Replies are listed 'Best First'.
Re: Substr versus Regexp by trammell (Priest) on Dec 31, 2004 at 15:56 UTC
You might get better performance with stringwise XOR (see `perldoc perlop`, search for `Bitwise String Operators`): `sub characters_in_common { my ($p, $q) = @_; my $r = $p ^ $q; my $count = $r =~ y/\0//; return $count; }` [download] BTW, I don't think your function calculates the Hamming distance...	[reply] [d/l]
Re: Substr versus Regexp by sgifford (Prior) on Dec 31, 2004 at 22:06 UTC
You can always replace `substr` with a regexp, but the regexp will generally be slower (though often not by much). That's because the additional flexibility of regular expressions requires that they be parsed and executed by a fairly elaborate regular expression engine, while `substr` basically indexes into an array and moves some characters around. Unless a regexp is much clearer or you can replace multiple calls to `substr` with a single regexp, you're usually better off using `substr` where possible.	[reply] [d/l] [select]
Re: Substr versus Regexp by Courage (Parson) on Dec 31, 2004 at 19:30 UTC
Perl regular expressions are highly optimized. so people often write something like `/^(...)/` however your particular example should be written other way. Looks like youre recently from C world... Welcome to perl! youll write shorter code soon... addition using `index` function should boost speed and shorten your code simultaneously, see `perldoc -f index` Best regards, Courage, the Cowardly Dog	[reply] [d/l] [select]
Re: Substr versus Regexp by nornagon (Acolyte) on Jan 01, 2005 at 05:17 UTC
I'd do it like this: `sub string_distance { my ($a, $b) = @_; my @a = split //, $a; my @b = split //, $b; my $count = 0; for (my $i = 0; $i < length($a); $i++) { $count++ if ($a[$i] eq $b[$i]); } return $count; }` [download]	[reply] [d/l]