Re^2: Regex related question

I think there needs to be a condition so that the last substr is only run if needed. I came up with a similar coding.. If speed is of interest, then I would benchmark these substr approach vs the regex. I've found that sometimes the s/// can be slow, but the regex engine evolves all the time so benchmarking would be the only way to really know for the Perl that is being used.

#!/usr/bin/perl -w
use strict;

my @strings = qw ( ACTGCTAGGGGGGG TCAGCTAGCNA
                   ACTGSCGACAAAA  GTCTGAGTTATTT);

foreach my $str (@strings)
{
    my $last_char = substr ($str,-1,1);
    my $cur_index = -1;
    while (substr ($str, --$cur_index,1) eq $last_char){}
    
    print "old: $str \n"; 
    substr ($str,$cur_index+1,-$cur_index-3,"") if ($cur_index < 3);
    print "new: $str\n";
}

__END__
old: ACTGCTAGGGGGGG 
new: ACTGCTAGG
old: TCAGCTAGCNA 
new: TCAGCTAGCNA
old: ACTGSCGACAAAA 
new: ACTGSCGACAA
old: GTCTGAGTTATTT 
new: GTCTGAGTTATT
[download]

Comment on Re^2: Regex related question Download Code

Replies are listed 'Best First'.
Re^3: Regex related question by davido (Cardinal) on Aug 08, 2011 at 08:28 UTC
I usually would say that the minor speed difference shouldn't matter. But all I know about genome mapping is that it's computationally intensive, so checking it out is probably a good idea. Dave	[reply]