Boz has asked for the wisdom of the Perl Monks concerning the following question:

Hi i'm a new initiate seeking absolution about concatenating to keys within a hash. I have read that this is not allowed therefore i would ask the revered monks what approach i should take. I have been trying to understand hash referencing but this seems division one whilst i'm still playing Sunday league.

The hash %bact contains sequences as values and unique identifiers as keys. I am trying to add subsets to the unique identifiers in the keys and counts to the subsets

e.g. keys :- UniqueID_set1_0001, UniqueID_set1_0002, UniqueID_set1_0003, UniqueID_set2_0001.............

Perhaps someone could suggest a simpler way to do it or point me in the right direction about "hard referencing" . The documentation goes off on some very confusing tangents about this for a newbie.

I get no substitutions taking place either with "Use of uninitialized value in substitution (s///) error messages". I have tried searching the web .. this seems to be a common error message caused by many things and after spending hours trying to understand what's wrong i became a convert and decided to turn to the monastery for enlightenment. For any help gratefully received i would be truly thankful. Amen

#! /usr/bin/perl -w use strict; $set1count=000; $set2count=000; $set3count=000; $set4count=000; $set5count=000; foreach my $id (keys %bact) { if (($bact{$id} =~ m/^CAGGTGGCAT/)) { s/^CAGGTGGCAT//; $id .='_set1_'; $set1count++; $id .=$set1count; } elsif (($bact{$id} =~ m/^CATTGAAGCT/)) { s/^CATTGAAGCT//; $id .='_set2_'; $set2count++; $id .=$set2count; } elsif (($bact{$id} =~ m/^CTAAGTTCAG/)) { s/^CTAAGTTCAG//; $id .='_set3_'; $set3count++; $id .=$set3count; } elsif (($bact{$id} =~ m/^CTAAGAACGT/)) +{ s/^CTAAGAACGT//; $id .='_set4_'; $set4count++; $id .=$set4count; } else { s/^CTGGAGGACT//; $id .='_set5_'; $set5count++; $id .=$set5count; } } print "------------------------------------------------------ +---------\n"; print "-------- ABUNDANCE OF SEQUENCES WTHIN EACH SUBGROUP - +---------\n"; print "number of sequences in set 1 CAGGTGGCAT sub group = $se +t1count \n"; print "number of sequences in set 2 CATTGAAGCT sub group = $se +t2count \n"; print "number of sequences in set 3 CTAAGTTCAG sub group = $se +t3count \n"; print "number of sequences in set 4 CTAAGAACGT sub group = $se +t4count \n"; print "number of sequences in set 5 CTGGAGGACT sub group = $se +t5count \n"; ...

Replies are listed 'Best First'.
Re: Concatenate to keys in a hash whilst doing substitution on values
by kennethk (Abbot) on Feb 11, 2010 at 20:41 UTC

    First, please try to make sure when you post code that it doesn't throw compiler warnings - in this case you've used strict, but have not included my in front of your counters and have not declared %bac. Second, it's nearly always easier to analyze code when given appropriate initialization - in this case, giving an initial value to %bac so we could meaningfully run your code would be helpful. See How do I post a question effectively?.

    It's certainly true that references can be confusing to a neophyte and that some of the documentation can be a bit arcane to that same lot. In general, difficult topics will have a tutorial document that is geared more toward introduction - for references, that would be perlreftut. More in depth documentation on this topic is in perlref, perllol and perldsc.

    The short of it is that a hash reference is just a scalar that points to a hash. You can use the deference operator (->, perlop) in order to turn the reference into what it is pointing to. If $hash_ref is a hash reference, then the element corresponding to key is accessed as $hash_ref->{key}, just like $hash{key}.

    There are two bugs I see in your code, if I understand it properly. First, when you use keys to get a list of keys in a hash, the values in that list are not tied to the actual keys. This means if you change them, it will not impact the original keys. Second, when you perform your substitutions you don't bind them (=~, perlop) to a variable, so that means you are applying them to the magic variable $_. $_ is not initialized, thus your warning.

    Rather than using the data structure you have, I would use a hash or hashes, one of the advanced data structures discussed in perllol. This just consists of storing a series of hash references in a hash, thus creating a tree. My code might look more like:

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %bact = (first => 'CAGGTGGCAT', second => 'CATTGAAGCT', third => 'CTAAGTTCAG', fourth => 'CTAAGAACGT', fifth => 'CTGGAGGACT', ); my @counts = (0) x 5; my %bact_data; foreach my $id (keys %bact) { $bact_data{$id}{DNA} = $bact{$id}; } foreach my $id (keys %bact) { if (($bact{$id} =~ m/^CAGGTGGCAT/)) { $bact_data{$id}{DNA} = $bact{$id}; $bact_data{$id}{set} = 1; $counts[0]++; $bact_data{$id}{count} = $counts[0]; } elsif (($bact{$id} =~ m/^CATTGAAGCT/)) { $bact_data{$id}{DNA} = $bact{$id}; $bact_data{$id}{set} = 2; $counts[1]++; $bact_data{$id}{count} = $counts[1]; } elsif (($bact{$id} =~ m/^CTAAGTTCAG/)) { $bact_data{$id}{DNA} = $bact{$id}; $bact_data{$id}{set} = 3; $counts[2]++; $bact_data{$id}{count} = $counts[2]; } elsif (($bact{$id} =~ m/^CTAAGAACGT/)) { $bact_data{$id}{DNA} = $bact{$id}; $bact_data{$id}{set} = 4; $counts[3]++; $bact_data{$id}{count} = $counts[3]; } else { $bact_data{$id}{DNA} = $bact{$id}; $bact_data{$id}{set} = 5; $counts[4]++; $bact_data{$id}{count} = $counts[4]; } } print "--------------------------------------------------------------- +\n"; print "-------- ABUNDANCE OF SEQUENCES WTHIN EACH SUBGROUP ---------- +\n"; print "number of sequences in set 1 CAGGTGGCAT sub group = $counts[0] +\n"; print "number of sequences in set 2 CATTGAAGCT sub group = $counts[1] +\n"; print "number of sequences in set 3 CTAAGTTCAG sub group = $counts[2] +\n"; print "number of sequences in set 4 CTAAGAACGT sub group = $counts[3] +\n"; print "number of sequences in set 5 CTGGAGGACT sub group = $counts[4] +\n"; print "--------------------------------------------------------------- +\n"; print Dumper(\%bact_data);

    If I were going from scratch, I would certainly build this differently; however I used the above since I think it will be more clear. If this does not do what you intend, post sample input and expected output so we can better understand your spec.

Re: Concatenate to keys in a hash whilst doing substitution on values
by leocharre (Priest) on Feb 11, 2010 at 21:08 UTC

    As for the "Use of uninitialized value in substitution.." error.. This is because the string/value/thing you are trying to regex/match into, is not defined.

    my $value_0; my $value_1 = 'i have a defined value'; my $value_2 = 0; $value_0=~/content/; # this will complain $value_1=~/content/; # won't complain $value_2=~/content/; # won't complain

    So, one thing you can do to avoid the error is:

    if ( defined $value_0 and $value_0=~/content/ ){ ...

    You can run the command 'perldoc -f defined' for more info.