angerusso has asked for the wisdom of the Perl Monks concerning the following question:
I have a UPDATED datafile which has only "W" and "Ms" entries. As in the example, I want to count number of A's which have "M" appearing atleast once over unique column names. I want to sum over rows, not columns. As long as "M" appears in column once, I just count that row as 1.
Gname G1 G1 G1 G1 G2 G2 G3 A W W M W W W M A W W W W W W W A W W W W W W W B W W W W W M M B M W W W W M M C M M M W W W W C M W W M M W W The output should be: Gname G1 G2 G3 A 1 0 1 B 1 2 2 C 2 1 0
I have written the following code to write the header row but I am very confused how should I start counting over blocks/chunks of data like I want. Can anyone help?
#!/usr/bin/perl -w if (@ARGV != 1){ print "USAGE: ./parse-counts.pl file\n"; exit(-1); } $mutfile = $ARGV[0]; %hash = (); open(INPUTR,"<$mutfile") || die "Can't open \$mutfile for reading. \n" +; while($line=<INPUTR>){ chomp $line; @toks = split(/\t/,$line); if ($toks[0] =~ /^Gname/){ $k = 0; # loop over the header row to get the unique "Gname"s @header = split(/\t/,$line); for $j (1..@toks-2){ $i = $j+1; if ($header[$i] ne $header[$j]){ $k++; $name[$k] = $header[$j]; } } for $i (0..$k){ $hash{$toks[0]}{$name[$k]} = $name[$k]; } } else { $k = 0; for $j (1..@toks-2){ $i = $j+1; if ($header[$i] ne $header[$j]){ $k++; $hash{$toks[0]}{$name[$k]} = 0; if ($toks[$j] =~ /M/){ $hash{$toks[0]}{$name[$k]} = 1; } } } } } close(INPUTR); $outdata = $mutfile; $outdata =~ /(.+).txt/; $outdata = $1."-COUNTS.txt"; open(OUTD,">$outdata"); foreach $idname (sort keys %hash){ if ($idname =~ /^Gname/){ print OUTD $idname; foreach $gid (sort keys %{$hash{$idname}}){ print OUTD "\t".$hash{$idname}{$gid}; } print OUTD "\n"; } } foreach $idname (sort keys %hash){ if ($idname !~ /^Gname/){ print OUTD $idname; foreach $gid (sort keys %{$hash{$idname}}){ print OUTD "\t".$hash{$idname}{$gid}; } print OUTD "\n"; } } close(OUTD); print "Printing $outdata file done.\n";
|
|---|