monkfan has asked for the wisdom of the Perl Monks concerning the following question:

Dear friends,

After a couple of months dwelling into Perl scripting especially in manipulating strings, I found myself resorting to use "substr" function a lot.

I had a feeling that the most of the "substr" function can be replaced with regexp in any cases. For example the simple code below. I wonder how would masters redo this function with pure reqexp approach.
sub hamming_distance_string{ #String length is assumed to be equal my ($a,$b) = @_; my $len = length ($a); my $num_match=0; for (my $i=0; $i<$len; $i++) { ++$num_match if substr($a, $i, 1) eq substr($b, $i, 1); } return $num_match; }
Is it true that 'substr' function can be rewritten with regexp all the time? If so when is it better to use which? Any rule of thumb?

Hope to hear from you again. Thanks so much for your time.
And wishing you all a very Happy and Prosperous New Year 2005!

Regards,
Edward

Replies are listed 'Best First'.
Re: Substr versus Regexp
by trammell (Priest) on Dec 31, 2004 at 15:56 UTC
    You might get better performance with stringwise XOR (see perldoc perlop, search for Bitwise String Operators):
    sub characters_in_common { my ($p, $q) = @_; my $r = $p ^ $q; my $count = $r =~ y/\0//; return $count; }
    BTW, I don't think your function calculates the Hamming distance...
Re: Substr versus Regexp
by sgifford (Prior) on Dec 31, 2004 at 22:06 UTC
    You can always replace substr with a regexp, but the regexp will generally be slower (though often not by much). That's because the additional flexibility of regular expressions requires that they be parsed and executed by a fairly elaborate regular expression engine, while substr basically indexes into an array and moves some characters around. Unless a regexp is much clearer or you can replace multiple calls to substr with a single regexp, you're usually better off using substr where possible.
Re: Substr versus Regexp
by Courage (Parson) on Dec 31, 2004 at 19:30 UTC
    Perl regular expressions are highly optimized. so people often write something like /^(...)/

    however your particular example should be written other way. Looks like youre recently from C world... Welcome to perl! youll write shorter code soon...

    addition using index function should boost speed and shorten your code simultaneously, see perldoc -f index

    Best regards,
    Courage, the Cowardly Dog

Re: Substr versus Regexp
by nornagon (Acolyte) on Jan 01, 2005 at 05:17 UTC
    I'd do it like this:
    sub string_distance { my ($a, $b) = @_; my @a = split //, $a; my @b = split //, $b; my $count = 0; for (my $i = 0; $i < length($a); $i++) { $count++ if ($a[$i] eq $b[$i]); } return $count; }