Hash making

sesemin has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Hash making by ikegami (Patriarch) on Sep 21, 2008 at 04:56 UTC
You have quite a show-stopping issues. Random application of `my`, part 1. `my $snp_chip_covered++;` [download] makes no sense. `my` creates a variable. Why would you increment a variable you just created? You might as well just use `my $snp_chip_covered = 1;` [download] since it's equivalent. Declare the variable outside the loop so that you have a continuously increasing variable instead of a new variable each loop pass that's always equal to 1. Random application of `my`, part 2. `my $snp_covered{$mismatch}` [download] makes no sense. `$snp_covered{$mismatch}` is not a variable. Remove the `my`. You simply want an assignment. You still need to declare the hash (`%snp_covered`), though, but you want to do it outside the loop. Again, you don't want a new hash for every pass of the loop. Assign once The hash assignment is outside of the loop, so how to you expect it to be executed multiple times? Constant key. You're always using `$mismatch` as the key, which is always 3. I think you meant to use $snp_chip_covered as the key. Incorrect value. You want to use 110000, 50000, ..., 1000 as the values, but you're currently using 1, 2, ..., 20. I don't know where the numbers you want are coming from. Maybe `$current_line[`something`]`. Fixed: `my $snp_chip_covered; my %snp_covered; while (<INPUT2>) { chomp; my @current_line = split /\t/; next if $current_line[5] != 1 \|\| $current_line[14] < 3; $snp_covered{++$snp_chip_covered} = '?????'; }` [download] That was just the necessities. There's one major improvement you can make, though. Why are you using a hash if the keys are numerically ascending? That's an array! Fixed: `my @snp_covered; while (<INPUT2>) { chomp; my @current_line = split /\t/; next if $current_line[5] != 1 \|\| $current_line[14] < 3; push @snp_covered, '?????'; }` [download]	[reply] [d/l] [select]
Re^2: Hash making by AnomalousMonk (Archbishop) on Sep 21, 2008 at 08:47 UTC
++ikegami for sheer, absolute patience.	[reply]
Re^2: Hash making by sesemin (Beadle) on Sep 21, 2008 at 05:06 UTC
Thanks Ikegami, for fast response. Does the following construct make sense to have within a while loop? `for (my $mismatch=0; $mismatch<=20; $mismatch++){ next unless $current_line[5] == 1 && $current_line[14] + >= $mismatch; $count++; } my $snp_covered{$mismatch}= $count;` [download]	[reply] [d/l]
Re^3: Hash making by ikegami (Patriarch) on Sep 21, 2008 at 05:07 UTC
It doesn't even compile for reasons I've already explained.	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Hash making by friedo (Prior) on Sep 21, 2008 at 04:40 UTC
`my $snp_covered{$mismatch}= $snp_chip_covered;` ...doesn't make sense. You'll need to declare your hash as a lexical (my) variable outside the loop, so it won't get clobbered each time through. Then you can add keys to it in the normal way. However, I don't quite understand what you're trying to do with `%snp_covered`. You're using the same key each time (`$mismatch`, which is 3) and overwriting the previous value each time there's a match. If you just want to keep track of the total number of matches, you can use a plain scalar and just increment it each time: `my $count; while(<INPUT2>){ chomp; my @current_line = split /\t/; my $mismatch =3; next unless $current_line[5] == 1 && $current_line[14] >= $mismatch; $count++; }` [download]	[reply] [d/l] [select]
Re^2: Hash making by sesemin (Beadle) on Sep 21, 2008 at 04:52 UTC
Thanks Friedo, I think was not clear enough. I just put $mismatch = 3 to test if I can create a loop. So at each $mismatch, I will have a series of line extracted let's say 1000000 for $mismatch =1. Then when mismatch changes to 2, another set of lines will be extracted. let's say 700000. The goal is to have has with keys = different $mismatches and values number of lines extracted. Any further help is appreciated.	[reply]
Re^3: Hash making by ikegami (Patriarch) on Sep 21, 2008 at 05:28 UTC
So at each $mismatch I don't understand "being at a variable". I will have a series of line extracted let's say 1000000 for $mismatch =1. How do you determine how many lines to extract? Then when mismatch changes to 2, another set of lines will be extracted. let's say 700000. What causes `$mismatch` to change? I will have a series of line extracted I don't understand "extracting a line". Do you mean "reading a line"? What do with the lines you've extracted?	[reply] [d/l]
Re^4: Hash making by sesemin (Beadle) on Sep 21, 2008 at 05:44 UTC
Re^5: Hash making by ikegami (Patriarch) on Sep 21, 2008 at 19:20 UTC
Re: Hash making by apl (Monsignor) on Sep 21, 2008 at 11:31 UTC
Take pity on ikegami. Don't write Perl code to perform this task. Write out how you would solve the problem (step by step) in English. Don't talk about hashes or loops. Just describe how you would solve the problem if you had to do it manually. That should give you an insight into how to code a solution.	[reply]
Re^2: Hash making by sesemin (Beadle) on Sep 21, 2008 at 21:14 UTC
Thanks APL, This problem has gone way far off. Let's Start over as you suggested. simple questions: If you have a tab delimited file with e.g. 4 columns. How would you read it over and over to extract data with different conditions. Let's focus on col4. If the values range from 0-20. I want to read extract the lines that col4 ==4, save number of lines read (met the condition) somewhere. Then automatically increase it to 5 and see how may lines this time will be extracted, and then add your criteria (this time col4==5) and the number of liens read (just count not the actual lines) to the somewhere that you had for the previous iteration. You will end up with a structure like this. `key(criteria) value (number of lines extracted) 0=>2000 1=>1800 2=>1600 and so on.` [download] Your thoughts are very appreciated.	[reply] [d/l]
Re^3: Hash making by ikegami (Patriarch) on Sep 21, 2008 at 21:54 UTC
That's so much clearer! The solution is: `my %counts; while (<$fh>) { chomp; my @fields = split /\t/; $counts{ $fields[3] }++; }` [download] You can print the results as follows: `for ( sort { $a <=> $b } keys %counts ) { print("$_: $counts{$_}\n"); }` [download] By the way, I said it was clearer, but it still not that clear. You still used the word "extract", for starters. It appears to mean "count" in this case. You could have said "Count how many times each different value occurs in the 4th column", but you decided to talk about how to do it (code) instead of of what you want (data).	[reply] [d/l] [select]
Re^3: Hash making by apl (Monsignor) on Sep 21, 2008 at 21:46 UTC
You don't need a hash for that; a simple array will do. Assume you want to check if column 4 is equal to 4, 5, .. N For each line in a file Split up the line into its component fields For each $index in the range 4 through N inclusive Increment $count [$index ] You can print $count out at the end. If you want to store each line that meets a certain criteria, make a two dimensional array (first dimension would be $index, the second the $count [$index ] value before you increment it).	[reply]