Re: Help with function, count matches and store in hash & retrieve sorted
by GrandFather (Saint) on Jul 02, 2017 at 23:18 UTC
|
It's kinda hard to check your code without some data to drive match_query. However if $count doesn't get any entries the assignment my $max_count = shift @array; will assign undef and give the warning described.
Note that your code is very Cish making it harder than it needs to be to see what is going on. Replace your C for loops with Perl for loops over the range 0 .. length($key)-1 for a start. You should also use %count and %element so it's obvious what type is being used instead of depending on some Perl magic to create an appropriate reference at run time. That makes the code easier to read and provides better error checking.
Don't use prototype subs (sub match_query() {). They don't do what you want and in the case of match_query your prototype is clearly wrong. The sub says don't give me any arguments in the prototype, then shifts off two arguments!
If the hint in the first paragraph isn't enough you'll need to mock up something to call match_query with appropriate arguments that we can run ourselves.
Premature optimization is the root of all job security
| [reply] [d/l] [select] |
Re: Help with function, count matches and store in hash & retrieve sorted
by Discipulus (Canon) on Jul 02, 2017 at 23:08 UTC
|
Hello Pathogenomix and welcome to monastery and to the wonderful world of Perl!
For the moment I only spotted an empty prototype that can cause strange behaviours: sub match_query(){.. must be sub match_query{..
from modernperlbook:
> A subroutine declared with an empty prototype (as opposed to an absent prototype) which evaluates to a single expression becomes a (compiletime ( editor’s note )) constant in the Perl 5 optree rather than a subroutine call
Anyway the Basic debugging checklist suggest to print out your data and I use this as primary, if only, debug tecnique. The principle is: know your data
The easiest thing to check is if something come up to empty in your original received hash:
sub match_query(){
my $indexfile = shift;
foreach my $key(keys %$indexfile){ print "DEBUG: [$key]=>[$$indexf
+ile{$key}]\n"}
Add also a debug print statement for @match_count
Instead of bare print statements you can profit of the core Data::Dumper or the better, imho, Data::Dump that you can find at CPAN.
L*
There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
| [reply] [d/l] [select] |
Re: Help with function, count matches and store in hash & retrieve sorted
by huck (Prior) on Jul 02, 2017 at 23:17 UTC
|
use strict; use warnings;
use Data::Dumper;
my $indexfile = {
'GCTCAGGA' => '4',
'AGCGTAGC' => '5',
'TTCTGCCT' => '3',
'CATGCCTA' => '6',
'TCGCCTTA' => '1',
'GTAGAGAG' => '7',
'GCGTAAGA' => '13',
'TAGATCGC' => '9',
'CTAGTACG' => '2',
'CTAAGCCT' => '16',
'CTCTCTAT' => '10',
'ACTGCATA' => '14',
'AGAGTAGA' => '12',
'TATCCTCT' => '11',
'AAGGAGTA' => '15',
'CCTCTCTG' => '8'
};
my $queryseq='';
for my $key (keys %$indexfile) {$queryseq.=$key;}
print "First\n";
match_query($indexfile,$queryseq);
print "second\n";
$indexfile->{a}=1;
match_query($indexfile,$queryseq);
sub match_query{
my $indexfile = shift;
my $queryseq = shift;
print Dumper($indexfile);
my @match_count;
for my $key (keys %$indexfile){
my $count;
for(my $k=0; $k<length($key)-1; $k++){
my $k_nuc = substr $key,$k,2;
for(my $i=0; $i<length($queryseq)-1; $i++){
my $i_nuc = substr $queryseq,$i,2;
my $rel = $k-$i;
if($k_nuc eq $i_nuc){
$count->{$rel} ||= 0;
$count->{$rel} += 1;
}else{
$count->{$rel} ||= 0;
$count->{$rel} += 0;
}
}
}
my @array = sort{$b <=> $a} values %$count;
my $max_count = shift @array;
my $element;
$element->{bar} = $key;
$element->{max} = $max_count;
push @match_count, $element;
}
my $result;
@match_count = sort{$b ->{max} <=> $a->{max}} @match_count; #CODE
+#FAILS HERE DOESNT LIKE ->{max}
if($match_count[0]->{max} == $match_count[1]->{max}){
$result = 0;
}elsif($match_count[0]->{max}>5){
my $called_bar = $match_count[0]->{bar};
$result = $indexfile->{$called_bar};
}else{
$result = 0;
}
return $result
}
Result
First
$VAR1 = {
'CTAGTACG' => '2',
'ACTGCATA' => '14',
'CTAAGCCT' => '16',
'CCTCTCTG' => '8',
'AGCGTAGC' => '5',
'GCGTAAGA' => '13',
'TCGCCTTA' => '1',
'AGAGTAGA' => '12',
'GCTCAGGA' => '4',
'TTCTGCCT' => '3',
'GTAGAGAG' => '7',
'CTCTCTAT' => '10',
'AAGGAGTA' => '15',
'CATGCCTA' => '6',
'TAGATCGC' => '9',
'TATCCTCT' => '11'
};
second
$VAR1 = {
'TCGCCTTA' => '1',
'AGAGTAGA' => '12',
'CTAAGCCT' => '16',
'CCTCTCTG' => '8',
'AGCGTAGC' => '5',
'GCGTAAGA' => '13',
'ACTGCATA' => '14',
'CTAGTACG' => '2',
'TATCCTCT' => '11',
'a' => 1,
'CATGCCTA' => '6',
'TAGATCGC' => '9',
'CTCTCTAT' => '10',
'AAGGAGTA' => '15',
'GTAGAGAG' => '7',
'TTCTGCCT' => '3',
'GCTCAGGA' => '4'
};
Use of uninitialized value in numeric comparison (<=>) at 1194053.pl l
+ine 66.
Use of uninitialized value in numeric comparison (<=>) at 1194053.pl l
+ine 66.
Use of uninitialized value in numeric comparison (<=>) at 1194053.pl l
+ine 66.
Use of uninitialized value in numeric comparison (<=>) at 1194053.pl l
+ine 66.
Use of uninitialized value in numeric comparison (<=>) at 1194053.pl l
+ine 66.
Use of uninitialized value in numeric comparison (<=>) at 1194053.pl l
+ine 66.
Use of uninitialized value in numeric comparison (<=>) at 1194053.pl l
+ine 66.
Use of uninitialized value in numeric comparison (<=>) at 1194053.pl l
+ine 66.
Notice the inclusion of use Data::Dumper; and print Dumper($indexfile);. Notice how in the second pass $indexfile has a "short key". I suspect that if you make the same mods to your program you will also notice a "short key". a short key bypasses for(my $k=0; $k<length($key)-1; $k++){ and so $count remains undef.
edit to add: an empty $queryseq also could produce the same kind of result.
| [reply] [d/l] [select] |
Re: Help with function, count matches and store in hash & retrieve sorted
by Marshall (Canon) on Jul 02, 2017 at 23:19 UTC
|
To your error msg, "@match_count = sort{$b ->{max} <=> $a->{max}} @match_count; #CODE",
"Use of uninitialized value in numeric comparison (<=>)".
So looks like @match_count is an array of hash references. One of those hash references has an undefined value for the key "max". Why not add some debugging code to print each $ref->{max} of refs in @match_count before the sort to see what it going on? Or simply use Data::Dumper; print Dumper \@match_count;
One Perl feature that looks like it would be useful to you is the //= operator. This can be used to set a variable to some default value if it is undefined. $var //= 1; Would set $var to 1 if it is undefined. If it is 0 or any other value, it is unchanged. I am still looking at your code, but I think you were trying to do something like that earlier but with the ||= operator? Not sure.
Update: I was looking further as to how $ref->{max} could wind up being undef. Something may be going wrong in the count hash. I do admit to being a bit puzzled by the code absent a textual explanation context. However, consider the following:
if($k_nuc eq $i_nuc){
$count->{$rel} ||= 0;
$count->{$rel} += 1;
}else{
$count->{$rel} ||= 0;
$count->{$rel} += 0;
}
=== possible change could be: =====
$count->{$rel} //= 0; ### probably not needed, see below
if($k_nuc eq $i_nuc){
$count->{$rel}++;
}
This makes sure that $count->{$rel} is created and set to zero if it isn't already defined. The //= operator will not affect the value if it already is defined to be something. Increment it if the condition is true. I don't see a need for ||= 0 and for sure not += 0.
It is a bit odd to even have to do this at all. $count->{$rel}++; will work even if the value doesn't even exist yet. This is actually a very common Perl idiom. I don't see anywhere where you make use of the zero values, so why even create them in the first place? If you need them, then fine. Otherwise the code can be simplified to just make hash entries if the value is 1 or greater. The //= operator was added I think in Perl 5.10 which was a long time ago.
| [reply] [d/l] [select] |
|
|
$VAR1 = [
{
'bar' => 'TATCCTCT',
'max' => undef
},
{
'bar' => 'CTCTCTAT',
'max' => undef
},
{
'bar' => 'AGCGTAGC',
'max' => undef
},
{
'bar' => 'AGAGTAGA',
'max' => undef
},
{
'max' => undef,
'bar' => 'CTAAGCCT'
},
{
'bar' => 'CATGCCTA',
'max' => undef
},
{
'bar' => 'TTCTGCCT',
'max' => undef
},
{
'max' => undef,
'bar' => 'GCTCAGGA'
},
{
'max' => undef,
'bar' => 'TCGCCTTA'
},
{
'max' => undef,
'bar' => 'CCTCTCTG'
},
{
'bar' => 'TAGATCGC',
'max' => undef
},
{
'max' => undef,
'bar' => 'ACTGCATA'
},
{
'bar' => 'GCGTAAGA',
'max' => undef
},
{
'bar' => 'CTAGTACG',
'max' => undef
},
{
'bar' => 'AAGGAGTA',
'max' => undef
},
{
'max' => undef,
'bar' => 'GTAGAGAG'
}
];
DEBUG: [TATCCTCT]=>[11]
DEBUG: [CTCTCTAT]=>[10]
DEBUG: [AGCGTAGC]=>[5]
DEBUG: [AGAGTAGA]=>[12]
DEBUG: [CTAAGCCT]=>[16]
DEBUG: [CATGCCTA]=>[6]
DEBUG: [TTCTGCCT]=>[3]
DEBUG: [GCTCAGGA]=>[4]
DEBUG: [TCGCCTTA]=>[1]
DEBUG: [CCTCTCTG]=>[8]
DEBUG: [TAGATCGC]=>[9]
DEBUG: [ACTGCATA]=>[14]
DEBUG: [GCGTAAGA]=>[13]
DEBUG: [CTAGTACG]=>[2]
DEBUG: [AAGGAGTA]=>[15]
DEBUG: [GTAGAGAG]=>[7]
| [reply] [d/l] |
|
|
I updated my post Re: Help with function, count matches and store in hash & retrieve sorted, is that helpful to you?
Well, after looking yet again, I am still not sure. Can you give use the exact inputs that you are using so that we can run and exactly replicate your error with your data?
I am a bit puzzled about how you can get an undef value for $max_count? Which is assigned to the key "max". Of course:
my $element;
$element->{bar} = $key;
$element->{max} = $max_count;
$element->{max} //= 0; #### adding this prevents undef
push @match_count, $element;
I guess I have the Sunday brain cramp.
| [reply] [d/l] |
Re: Help with function, count matches and store in hash & retrieve sorted
by Marshall (Canon) on Jul 03, 2017 at 02:58 UTC
|
You appear to be having trouble boiling the problem down to code that reliably reproduces the problem. I actually don't see how you wind up with a value of undef for a key of %count. There is probably something that you are not showing us.
This is an advanced concept, but it is possible to have a subroutine executed when a Perl warning happens. You can add this code below to your code. At the first warning message, this code should dump the subroutine's input parameters to match_query. match_query($indexfile,$queryseq) and then exit.
The results of this are what we want. When the error happens, what are the values of $index_file and $queryseq? From that, this error can be replicated.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
$SIG{'__WARN__'} = \&stop_n_report; #add this
my $indexfile = {
'GCTCAGGA' => '4',
'AGCGTAGC' => '5',
};
my $queryseq='something';
my $undef_val; #simulation of your code
print "$undef_val"; #simulation of your code to produce an error
sub stop_n_report ## add this
{
print "some kind of warning detected: @_\n";
print "Dumping input data\n";
print "Dumping indexfile:\n";
print Dumper $indexfile;
print "query=$queryseq\n";
exit;
}
__END__
some kind of warning detected: Use of uninitialized value $undef_val i
+n string at C:\Projects_Perl\testing\sigwarn.pl line 16.
Dumping input data
Dumping indexfile:
$VAR1 = {
'GCTCAGGA' => '4',
'AGCGTAGC' => '5'
};
query=something
| [reply] [d/l] [select] |
Re: Help with function, count matches and store in hash & retrieve sorted
by Pathogenomix (Novice) on Jul 03, 2017 at 01:30 UTC
|
sub get_index(){
my $file = shift;
my $indexfile;
open FILE,$file;
while(<FILE>){
chomp;
my @array = split /\,/,$_;
if($#array){
$indexfile->{$array[1]} = $array[0];
}
}
close FILE;
return $indexfile;
on a .txt file that looks like this :
1,TCGCCTTA
2,CTAGTACG
3,TTCTGCCT
4,GCTCAGGA
5,AGCGTAGC
6,CATGCCTA
7,GTAGAGAG
8,CCTCTCTG
9,TAGATCGC
10,CTCTCTAT
11,TATCCTCT
12,AGAGTAGA
13,GCGTAAGA
14,ACTGCATA
15,AAGGAGTA
16,CTAAGCCT
along with input files that are added as I iterate over a while loop. These files look like:
p1='CTAAGCCT'
I suppose its possible that p1 is occasionally an empty string, I am not sure how I would print my entire list of query strings.
please let me know if you need more information.
| [reply] [d/l] |
|
|
Ok, what would be the most helpful is just the data structures that are used in the call to match_query($indexfile,$queryseq); and which re-produce the problem.
I guess?:
my $indexfile = {
'GCTCAGGA' => '4',
'AGCGTAGC' => '5',
'TTCTGCCT' => '3',
'CATGCCTA' => '6',
'TCGCCTTA' => '1',
'GTAGAGAG' => '7',
'GCGTAAGA' => '13',
'TAGATCGC' => '9',
'CTAGTACG' => '2',
'CTAAGCCT' => '16',
'CTCTCTAT' => '10',
'ACTGCATA' => '14',
'AGAGTAGA' => '12',
'TATCCTCT' => '11',
'AAGGAGTA' => '15',
'CCTCTCTG' => '8'
};
What is the $queryseq that goes with that table?
What the heck does: "along with input files that are added as I iterate over a while loop" mean?
You are asking a question about an error that you are getting in a subroutine. In the best case, you provide a complete set of runnable code. All we have to do is download and hit the "run" button to replicate exactly your problem.
What we have now is kind of like a UFO report. If the problem can be reproduced (seen) by all, then there will be solutions forthcoming. Your job is to boil this down to a single set of inputs that "demo's the problem". If the $indexfile structure above doesn't need 16 entries to demo the problem, then use fewer entries. Sometimes submitting an excellent "bug" report requires a lot of work to get the situation down to a minimal, easily replicate-able situation. I have certainly spent entire work weeks doing that for complex issues.
| [reply] [d/l] [select] |
|
|
...
my $max_count = shift @array;
unless (defined ($max_count)) {
print "queryseq = |||$queryseq|||\n";
print Dumper($indexfile);
exit;
}
my $element;
$element->{bar} = $key;
$element->{max} = $max_count;
push @match_count, $element;
....
| [reply] [d/l] |
|
|
if($#array){
$indexfile->{$array[1]} = $array[0];
}
There may be a problem here. The statement
$indexfile->{$array[1]} = $array[0];
will be executed if the @array array is empty ($#array == -1; -1 is true), or if the @array array has two or more elements ($#array > 0). Is this what you want? (See discussion of the $# sigil (e.g., $#array) in perldata.)
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
Re: Help with function, count matches and store in hash & retrieve sorted
by Pathogenomix (Novice) on Jul 03, 2017 at 00:50 UTC
|
hello all and thanks for the help,so far.
I have made sure that a query sequence is being provided.
By running the debug statement suggested above I have found that I have undef max values. How can I get around this?
Thank you very much everyone!
$VAR1 = [
{
'bar' => 'TATCCTCT',
'max' => undef
},
{
'bar' => 'CTCTCTAT',
'max' => undef
},
{
'bar' => 'AGCGTAGC',
'max' => undef
},
{
'bar' => 'AGAGTAGA',
'max' => undef
},
{
'max' => undef,
'bar' => 'CTAAGCCT'
},
{
'bar' => 'CATGCCTA',
'max' => undef
},
{
'bar' => 'TTCTGCCT',
'max' => undef
},
{
'max' => undef,
'bar' => 'GCTCAGGA'
},
{
'max' => undef,
'bar' => 'TCGCCTTA'
},
{
'max' => undef,
'bar' => 'CCTCTCTG'
},
{
'bar' => 'TAGATCGC',
'max' => undef
},
{
'max' => undef,
'bar' => 'ACTGCATA'
},
{
'bar' => 'GCGTAAGA',
'max' => undef
},
{
'bar' => 'CTAGTACG',
'max' => undef
},
{
'bar' => 'AAGGAGTA',
'max' => undef
},
{
'max' => undef,
'bar' => 'GTAGAGAG'
}
];
DEBUG: [TATCCTCT]=>[11]
DEBUG: [CTCTCTAT]=>[10]
DEBUG: [AGCGTAGC]=>[5]
DEBUG: [AGAGTAGA]=>[12]
DEBUG: [CTAAGCCT]=>[16]
DEBUG: [CATGCCTA]=>[6]
DEBUG: [TTCTGCCT]=>[3]
DEBUG: [GCTCAGGA]=>[4]
DEBUG: [TCGCCTTA]=>[1]
DEBUG: [CCTCTCTG]=>[8]
DEBUG: [TAGATCGC]=>[9]
DEBUG: [ACTGCATA]=>[14]
DEBUG: [GCGTAAGA]=>[13]
DEBUG: [CTAGTACG]=>[2]
DEBUG: [AAGGAGTA]=>[15]
DEBUG: [GTAGAGAG]=>[7]
| [reply] [d/l] |