sesemin has asked for the wisdom of the Perl Monks concerning the following question:

Hi Dear Monks,

Do you why this does not work? For the number of lines that meet my conditions I want to make a hash of $mismatch and line number count. in this case a hash with key =3 and value some number. Because I am using strict it asks me to make $mismatch as my $mismatch.

If you make it to work. I need to loop through $mismatch from 1 to 20 and complete my hash keys as $mismtach and values are number of lines with each criteria set by $mismatch.

a hash like:

1 => 110000 2=> 50000 .. 20 => 1000
while(<INPUT2>){ chomp; my @current_line = split /\t/; my $mismatch =3; next unless $current_line[5] == 1 && $current_line[14] >= $m +ismatch; my $snp_chip_covered++; } my $snp_covered{$mismatch}= $snp_chip_covered;

Replies are listed 'Best First'.
Re: Hash making
by ikegami (Patriarch) on Sep 21, 2008 at 04:56 UTC

    You have quite a show-stopping issues.

    • Random application of my, part 1.

      my $snp_chip_covered++;

      makes no sense. my creates a variable. Why would you increment a variable you just created? You might as well just use

      my $snp_chip_covered = 1;

      since it's equivalent. Declare the variable outside the loop so that you have a continuously increasing variable instead of a new variable each loop pass that's always equal to 1.

    • Random application of my, part 2.

      my $snp_covered{$mismatch}

      makes no sense. $snp_covered{$mismatch} is not a variable. Remove the my. You simply want an assignment. You still need to declare the hash (%snp_covered), though, but you want to do it outside the loop. Again, you don't want a new hash for every pass of the loop.

    • Assign once

      The hash assignment is outside of the loop, so how to you expect it to be executed multiple times?

    • Constant key.

      You're always using $mismatch as the key, which is always 3. I think you meant to use $snp_chip_covered as the key.

    • Incorrect value.

      You want to use 110000, 50000, ..., 1000 as the values, but you're currently using 1, 2, ..., 20. I don't know where the numbers you want are coming from. Maybe $current_line[something].

    Fixed:

    my $snp_chip_covered; my %snp_covered; while (<INPUT2>) { chomp; my @current_line = split /\t/; next if $current_line[5] != 1 || $current_line[14] < 3; $snp_covered{++$snp_chip_covered} = '?????'; }

    That was just the necessities. There's one major improvement you can make, though.

    • Why are you using a hash if the keys are numerically ascending? That's an array!

    Fixed:

    my @snp_covered; while (<INPUT2>) { chomp; my @current_line = split /\t/; next if $current_line[5] != 1 || $current_line[14] < 3; push @snp_covered, '?????'; }
      ++ikegami for sheer, absolute patience.
      Thanks Ikegami, for fast response.

      Does the following construct make sense to have within a while loop?

      for (my $mismatch=0; $mismatch<=20; $mismatch++){ next unless $current_line[5] == 1 && $current_line[14] + >= $mismatch; $count++; } my $snp_covered{$mismatch}= $count;
        It doesn't even compile for reasons I've already explained.
        A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Hash making
by friedo (Prior) on Sep 21, 2008 at 04:40 UTC
    my $snp_covered{$mismatch}= $snp_chip_covered; ...doesn't make sense.

    You'll need to declare your hash as a lexical (my) variable outside the loop, so it won't get clobbered each time through. Then you can add keys to it in the normal way.

    However, I don't quite understand what you're trying to do with %snp_covered. You're using the same key each time ($mismatch, which is 3) and overwriting the previous value each time there's a match. If you just want to keep track of the total number of matches, you can use a plain scalar and just increment it each time:

    my $count; while(<INPUT2>){ chomp; my @current_line = split /\t/; my $mismatch =3; next unless $current_line[5] == 1 && $current_line[14] >= $mismatch; $count++; }
      Thanks Friedo,

      I think was not clear enough. I just put $mismatch = 3 to test if I can create a loop. So at each $mismatch, I will have a series of line extracted let's say 1000000 for $mismatch =1.

      Then when mismatch changes to 2, another set of lines will be extracted. let's say 700000.

      The goal is to have has with keys = different $mismatches and values number of lines extracted.

      Any further help is appreciated.

        So at each $mismatch

        I don't understand "being at a variable".

        I will have a series of line extracted let's say 1000000 for $mismatch =1.

        How do you determine how many lines to extract?

        Then when mismatch changes to 2, another set of lines will be extracted. let's say 700000.

        What causes $mismatch to change?

        I will have a series of line extracted

        I don't understand "extracting a line". Do you mean "reading a line"? What do with the lines you've extracted?

Re: Hash making
by apl (Monsignor) on Sep 21, 2008 at 11:31 UTC
    Take pity on ikegami. Don't write Perl code to perform this task. Write out how you would solve the problem (step by step) in English. Don't talk about hashes or loops. Just describe how you would solve the problem if you had to do it manually.

    That should give you an insight into how to code a solution.

      Thanks APL,

      This problem has gone way far off. Let's Start over as you suggested.

      simple questions: If you have a tab delimited file with e.g. 4 columns. How would you read it over and over to extract data with different conditions. Let's focus on col4. If the values range from 0-20. I want to read extract the lines that col4 ==4, save number of lines read (met the condition) somewhere. Then automatically increase it to 5 and see how may lines this time will be extracted, and then add your criteria (this time col4==5) and the number of liens read (just count not the actual lines) to the somewhere that you had for the previous iteration.

      You will end up with a structure like this.

      key(criteria) value (number of lines extracted) 0=>2000 1=>1800 2=>1600 and so on.

      Your thoughts are very appreciated.

        That's so much clearer! The solution is:
        my %counts; while (<$fh>) { chomp; my @fields = split /\t/; $counts{ $fields[3] }++; }

        You can print the results as follows:

        for ( sort { $a <=> $b } keys %counts ) { print("$_: $counts{$_}\n"); }

        By the way, I said it was clearer, but it still not that clear. You still used the word "extract", for starters. It appears to mean "count" in this case. You could have said "Count how many times each different value occurs in the 4th column", but you decided to talk about how to do it (code) instead of of what you want (data).

        You don't need a hash for that; a simple array will do.

        Assume you want to check if column 4 is equal to 4, 5, .. N

        • For each line in a file
          • Split up the line into its component fields
          • For each $index in the range 4 through N inclusive
            • Increment $count [$index ]

        You can print $count out at the end. If you want to store each line that meets a certain criteria, make a two dimensional array (first dimension would be $index, the second the $count [$index ] value before you increment it).