in reply to Concatenate to keys in a hash whilst doing substitution on values
First, please try to make sure when you post code that it doesn't throw compiler warnings - in this case you've used strict, but have not included my in front of your counters and have not declared %bac. Second, it's nearly always easier to analyze code when given appropriate initialization - in this case, giving an initial value to %bac so we could meaningfully run your code would be helpful. See How do I post a question effectively?.
It's certainly true that references can be confusing to a neophyte and that some of the documentation can be a bit arcane to that same lot. In general, difficult topics will have a tutorial document that is geared more toward introduction - for references, that would be perlreftut. More in depth documentation on this topic is in perlref, perllol and perldsc.
The short of it is that a hash reference is just a scalar that points to a hash. You can use the deference operator (->, perlop) in order to turn the reference into what it is pointing to. If $hash_ref is a hash reference, then the element corresponding to key is accessed as $hash_ref->{key}, just like $hash{key}.
There are two bugs I see in your code, if I understand it properly. First, when you use keys to get a list of keys in a hash, the values in that list are not tied to the actual keys. This means if you change them, it will not impact the original keys. Second, when you perform your substitutions you don't bind them (=~, perlop) to a variable, so that means you are applying them to the magic variable $_. $_ is not initialized, thus your warning.
Rather than using the data structure you have, I would use a hash or hashes, one of the advanced data structures discussed in perllol. This just consists of storing a series of hash references in a hash, thus creating a tree. My code might look more like:
#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %bact = (first => 'CAGGTGGCAT', second => 'CATTGAAGCT', third => 'CTAAGTTCAG', fourth => 'CTAAGAACGT', fifth => 'CTGGAGGACT', ); my @counts = (0) x 5; my %bact_data; foreach my $id (keys %bact) { $bact_data{$id}{DNA} = $bact{$id}; } foreach my $id (keys %bact) { if (($bact{$id} =~ m/^CAGGTGGCAT/)) { $bact_data{$id}{DNA} = $bact{$id}; $bact_data{$id}{set} = 1; $counts[0]++; $bact_data{$id}{count} = $counts[0]; } elsif (($bact{$id} =~ m/^CATTGAAGCT/)) { $bact_data{$id}{DNA} = $bact{$id}; $bact_data{$id}{set} = 2; $counts[1]++; $bact_data{$id}{count} = $counts[1]; } elsif (($bact{$id} =~ m/^CTAAGTTCAG/)) { $bact_data{$id}{DNA} = $bact{$id}; $bact_data{$id}{set} = 3; $counts[2]++; $bact_data{$id}{count} = $counts[2]; } elsif (($bact{$id} =~ m/^CTAAGAACGT/)) { $bact_data{$id}{DNA} = $bact{$id}; $bact_data{$id}{set} = 4; $counts[3]++; $bact_data{$id}{count} = $counts[3]; } else { $bact_data{$id}{DNA} = $bact{$id}; $bact_data{$id}{set} = 5; $counts[4]++; $bact_data{$id}{count} = $counts[4]; } } print "--------------------------------------------------------------- +\n"; print "-------- ABUNDANCE OF SEQUENCES WTHIN EACH SUBGROUP ---------- +\n"; print "number of sequences in set 1 CAGGTGGCAT sub group = $counts[0] +\n"; print "number of sequences in set 2 CATTGAAGCT sub group = $counts[1] +\n"; print "number of sequences in set 3 CTAAGTTCAG sub group = $counts[2] +\n"; print "number of sequences in set 4 CTAAGAACGT sub group = $counts[3] +\n"; print "number of sequences in set 5 CTGGAGGACT sub group = $counts[4] +\n"; print "--------------------------------------------------------------- +\n"; print Dumper(\%bact_data);
If I were going from scratch, I would certainly build this differently; however I used the above since I think it will be more clear. If this does not do what you intend, post sample input and expected output so we can better understand your spec.
|
|---|