Re: Equation - code review
by GrandFather (Saint) on Aug 16, 2006 at 03:10 UTC
|
Does it work against your test suite?
Update
Why &alpha_index({...?
Why my $self=... when alpha_index doesn't seem to be a method?
Why my ($left,$middle,$right when they are used locally to the foreach (@carbon_index is correctly declared)
Why the assignment in my %carbon_aa=(); when it is empty to start with anyway?
Why my ($left_aa,$right_aa,$distance_index,$power)=0; when only $left_aa gets a value?
Why are $left_aa,$right_aa,$distance_index,$power not declared local to their foreach loops?
DWIM is Perl's answer to Gödel
| [reply] [d/l] [select] |
|
|
Hi,
Thanks for your valuable comments.
For your first two question,
>>
I just copied the method from my module and created a dummy arguments around it.
I can't paste the entire module, since dependency is more and the security. I apologize for that.
Here I have Updated my code based on your comments, please review it
#!c:\perl\bin\perl
use strict;
use warnings;
use Math::BigInt;
use Data::Dumper;
my %index_val = ( A=> 2,
C => 3,
D => 4,
N => 2,
M => 2,
T => 1,
G => 4);
my $sequence="ACTGACN";
my $distance_index= int(rand(4));
my $window=int(rand(3));
my $arr_ref = alpha_index({ sequence => $sequence,
distance => $distance_index,
aa_index => \%index_val,
window_size => $window });
print STDERR Dumper($arr_ref);
sub alpha_index {
my $self=shift;
my $sequence=$self->{sequence};
my $distance=$self->{distance};
my $aa_index=$self->{aa_index};
my @carbon_index;
my $window_size= $self->{window_size};
if($self->{window_size} ==1) { $window_size=1;}
elsif($self->{window_size} > 1 )
{ $window_size = int(($self->{window_size}-1)/2); }
else { $window_size=0; }
foreach (0.. (length($sequence)-1)) {
my %carbon_aa;
my ($left,$middle,$right) = $sequence =~ m/(\w{$_})?(\w{1}
+)?(\w*)/;
$carbon_aa{$middle}=$aa_index->{$middle} if($middle);
$left=reverse($left);
my $initial_index=2;
my $length_index= ($window_size) ? $window_size:length($left);
if($self->{window_size} ==1 ) {
push @carbon_index, \%carbon_aa if(%carbon_aa);
next;
}
foreach my $increment (0..$length_index-1) {
my $left_aa=substr($left,$increment,1);
next if not exists $aa_index->{$left_aa};
my $power=Math::BigInt->new($initial_index);
my $distance_index=($power->bpow($distance))->numify()
+;
$carbon_aa{$middle}+=
( $aa_index->{$left_aa} * (1/$distance_index));
$initial_index++;
}
$initial_index=2;
$length_index= ($window_size)?$window_size:length($right);
foreach my $increment (0..$length_index-1) {
my $right_aa=substr($right,$increment,1);
next if not exists $aa_index->{$right_aa};
my $power=Math::BigInt->new($initial_index);
my $distance_index=($power->bpow($distance))->numify()
+;
$carbon_aa{$middle}+=
( $aa_index->{$right_aa} * ( 1/$distance_index));
$initial_index++;
}
push @carbon_index, \%carbon_aa if(%carbon_aa);
}
return \@carbon_index;
}
Thanks in advance.
- kulls
| [reply] [d/l] |
|
|
($left,$middle,$right) = $sequence =~ m/(\w{$_})?(\w{1})?(\w*)/;
So I wrote a benchmark to try a few strategies:
Results for running 500 iterations with length 7:
Rate re_orig re_set bsubstr
re_orig 681/s -- -12% -92%
re_set 774/s 14% -- -91%
bsubstr 8475/s 1144% 995% --
Results for running 500 iterations with length 100:
Rate re_orig re_set bsubstr
re_orig 48.1/s -- -12% -93%
re_set 54.9/s 14% -- -92%
bsubstr 718/s 1392% 1209% --
Results for running 100 iterations with length 1000:
Rate re_set bsubstr
re_set 4.00/s -- -93%
bsubstr 59.5/s 1387% --
| [reply] [d/l] [select] |
|
|
#!c:\perl\bin\perl
use strict;
use warnings;
my %index_val = ( A=> 2,
C => 3,
D => 4,
N => 2,
M => 2,
T => 1,
G => 4);
my $sequence="ACTGACN";
my $distance_index = 2;
my $window = 2;
my $arr_ref = &alpha_index({ sequence => $sequence,
distance => $distance_index,
aa_index => \%index_val,
window_size => $window });
for my $hashRef (@$arr_ref) {
print join "\n", map {"$_ => $hashRef->{$_}"} keys %$hashRef;
print "\n";
}
sub alpha_index {
my $self=shift;
my $sequence=$self->{sequence};
my $distance=$self->{distance};
my $aa_index=$self->{aa_index};
my @carbon_index;
my $window_size = int $self->{window_size};
$window_size = int ($window_size - 1) / 2 if $window_size > 1;
foreach my $len (0 .. (length($sequence)-1)) {
my %carbon_aa;
my ($left, $middle, $right) = splitStr ($sequence, $len);
$carbon_aa{$middle} = $aa_index->{$middle} if $middle;
$left = reverse $left;
my $initial_index = 2;
my $length_index= $window_size ? $window_size : length $left;
if($self->{window_size} == 1 ) {
push @carbon_index, \%carbon_aa if %carbon_aa;
next;
}
foreach my $increment (0 .. $length_index-1) {
my $left_aa=substr($left,$increment,1);
next if not exists $aa_index->{$left_aa};
my $distance_index= $initial_index ** $distance;
$carbon_aa{$middle} += $aa_index->{$left_aa} * (1/$distanc
+e_index);
$initial_index++;
}
$initial_index=2;
$length_index= $window_size ? $window_size : length $right;
foreach my $increment (0 .. $length_index-1) {
my $right_aa=substr ($right, $increment, 1);
next if not exists $aa_index->{$right_aa};
my $distance_index= $initial_index ** $distance;
$carbon_aa{$middle} += $aa_index->{$right_aa} * (1 / $dist
+ance_index);
$initial_index++;
}
push @carbon_index, \%carbon_aa if %carbon_aa;
}
return \@carbon_index;
}
sub splitStr {
return (
substr($_[0], 0, $_[1]),
substr($_[0], $_[1], 1),
substr($_[0], $_[1] + 1)
);
}
DWIM is Perl's answer to Gödel
| [reply] [d/l] |
Re: Equation - code review
by BrowserUk (Patriarch) on Aug 16, 2006 at 08:06 UTC
|
The simplest way to speed this up would be to abandon the use of Math::BigInt. You use this in two places to raise numbers to a power:
$power=Math::BigInt->new($initial_index);
$distance_index=($power->bpow($distance))->numify();
but you then immediately retrieve the result as a standard scalar without performing any check for overflow.
Assuming the value '4' is typical scale of magnitude for $distance aka $self{ distance } aka $distance_index,
then using Perl's built-in exponentiation operator, $power ** $distance
cmpthese -3, {
BigInt => q[
my $x = Math::BigInt->new( $_ )->bpow( 4 )->numify() for 2 ..
+1e6;
],
standard => q[
my $x = $_ ** 4 for 2 .. 1e6;
],
};;
(warning: too few iterations for a reliable count)
s/iter BigInt standard
BigInt 172 -- -100%
standard 0.625 27472% --
will be at least 250 times quicker.
For sequences of length 10,000, standard math will safely accomodate distances of up to 78.
- 1e4 => 78
- 1e5 => 61
- 1e6 => 51
- 1e7 => 44
If you really need to use Math::BigInt for this calculation, then you should be doing the 1 / $distance_index part of your calculation before you numify the result of the previous step.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
Hi,
Thanks for your reply. literally i don't know how to handle the built-ins exponent function.I can able to find only Math::BigInt package. However I can't able to overright the object , instead of creating new objects every time in that module.This makes me very bad situation if my length of string id very high. . I have tested my code with in-built function and it is working fine.
-kulls
| [reply] [d/l] |
Re: Equation - code review
by imp (Priest) on Aug 16, 2006 at 03:21 UTC
|
You need to be a little more specific.
Do you want to know whether it is an efficient algorithm? Whether it works correctly? In both of these cases we need a definition of what the correct behaviour is.
I can say the following things about the code you provided:
- It's perl
- It could use inline comments or refactoring to make the behaviour more intuitive
- It could use a block of comments before alpha_index
- More whitespace would make it easier to read
| [reply] |
|
|
Hi,
Thanks for your reply. The output of the program is exactly correct which I have checked against the manual work. The performance issue can arise when my given input string is more than 1000 characters. Can you please suggest me, how to improve the performance inside the loops ?
- kulls
| [reply] |
|
|
You have been very polite and my inclination is to help you. Please make it easier to do so.
You have provided us with your current solution to a problem, but you haven't specified what that problem is. It is difficult to identify what areas might be unneccesary or wasteful in this situation.
Some inline documentation would be useful, as would some additional comments in this thread. Here are a few things that would be nice to know:
- Why is $distance_index and $window random
- More sample data. The $sequence provided is 7 characters, but in this comment you state that it can be over 1000 characters in length. The size of the input data will help determine which algorithm is appropriate.
- How often will this function be called
| [reply] |
|
|
|
|
#!c:\perl\bin\perl
use strict;
use warnings;
use Math::BigInt;
use Data::Dumper;
use Test::More qw(no_plan);
my %index_val = ( A => 2,
C => 3,
D => 4,
N => 2,
M => 2,
T => 1,
G => 4);
my $sequence = "ACTGACN";
my $distance_index = 3;
my $window = 2;
my $arr_ref = alpha_index({ sequence => $sequence,
distance => $distance_index,
aa_index => \%index_val,
window_size => $window });
ok($arr_ref);
diag("Testing alpha_index with window: $window, distance: $distance_in
+dex, sequence: $sequence");
is(ref($arr_ref),'ARRAY','alpha index returned an array');
is(@$arr_ref,7,'7 entries are in the array');
is($arr_ref->[0]{A},2.51025682971601);
Take a look at Test::More for other ideas on what you can test. | [reply] [d/l] |
Re: Equation - code review
by starbolin (Hermit) on Aug 16, 2006 at 05:12 UTC
|
Um... every time I run the program I get a different output. Is there some reason for the random number generator??
As far as coding style, there are no validity checks. If I feed garbage into the subroutine, it quietly gives me garbage out.
my %index_val = ( F => -1.8, O => 3, O => 4,
B => 'quxx', A => 2, R => 1,
P=>'foo');
Gives:
$VAR1 = [
{
'A' => '2.25'
},
{
'C' => '0.5'
},
{
'T' => '0.5'
},
{
'G' => '0.5'
},
{
'A' => '2.25'
},
{
'C' => '0.324074074074074'
},
{
'N' => '0.324074074074074'
}
];
Definitely "Not ready for Prime-Time."
Update: I don't mind if a program fails on garbage input, but it should fail noisily.
s//----->\t/;$~="JAPH";s//\r<$~~/;{s|~$~-|-~$~|||s
|-$~~|$~~-|||s,<$~~,<~$~,,s,~$~>,$~~>,,
$|=1,select$,,$,,$,,1e-1;print;redo}
| [reply] [d/l] [select] |
Re: Equation - code review
by ikegami (Patriarch) on Aug 16, 2006 at 05:35 UTC
|
You have a space before the "#!", but nothing can preceed the "#!" for it to keep its meaning. | [reply] |
Re: Equation - code review
by cdarke (Prior) on Aug 16, 2006 at 06:55 UTC
|
A small thing, but you do not seem to use the loop variable ($_) value in your foreach loops. Therefore it might be better to write the sequence starting from 1 and remove the subtraction, for example: foreach (0.. (length($sequence)-1)) becomes: foreach (1.. (length $sequence))
| [reply] [d/l] [select] |
|
|
my ($left,$middle,$right) = $sequence =~ m/(\w{$_})?(\w{1})?(\w*)/;
^
The result for 0..6 would be:
A CTGACN
A C TGACN
AC T GACN
ACT G ACN
ACTG A CN
ACTGA C N
ACTGAC N
| [reply] [d/l] [select] |