MB123 has asked for the wisdom of the Perl Monks concerning the following question:
Hi all,
I have this large text file in the format shown below -And this code-ID SNP FT SNP 433 FT /note="refAllele: T SNPstrains: 7083_1#5=C 7414_8#8=C 7480_8#4 +9=C " FT /colour=1 FT SNP 442 FT /note="refAllele: T SNPstrains: 7065_8#2=C 7065_8#94=C 7083_1# +2=C 7083_1#3=C 7083_1#41=C 7083_1#42=C 7083_1#43=C " FT /colour=1 FT SNP 460 FT /note="refAllele: T SNPstrains: 7564_8#14=C " FT /colour=1 FT SNP 703 FT /note="refAllele: G SNPstrains: 7521_5#39=A (non-synonymous) ( +AA Ala->Thr) " FT /colour=2 FT SNP 937 FT /note="refAllele: G SNPstrains: 7414_8#30=T (non-synonymous) ( +AA Val->Leu) " FT /colour=2 FT SNP 1269 FT /note="refAllele: G SNPstrains: 7480_7#22=A (synonymous) 7480_ +7#62=A (synonymous) " FT /colour=3 FT SNP 1804 FT /note="refAllele: T SNPstrains: 7414_7#66=A (non-synonymous) ( +AA Ser->Thr) 7414_8#44=A (non-synonymous) (AA Ser->Thr) 7521_6#54=A ( +non-synonymous) (AA Ser->Thr) " FT /colour=2 etc etc...
use strict; use warnings; use feature qw(say); my $file = "BSAC.pl"; my %cod = ( 1 => "red", 2 => "non", 3 => "green" ); open my $in, "<", "$file"; open my $out, ">", "output.txt"; say $out "Coordinate No of Strains AA Change"; my $SNP; my $count; my $change; while ( my $line = <$in> ) { chomp $line; say qq(DEBUG: Line = "$line"); if ( $line =~ /^FT\s+SNP\s+(\d+)/ ){ $SNP = $1; say qq(\$SNP = $1;); } elsif ( $line =~ /^FT\s+\/note="(.*)"/) { my $note = $1; say qq(my \$note = $1); $count = ($note =~ tr/=/=/); $note =~ /\((AA \w+->\w+)\)\s*$/; $change = $1 || ""; } elsif ( $line =~ /^FT\s+\/colour=(\d+)/ ) { say qq(Code = $1); if ( $cod{$1} eq "non" ) { printf $out "%-12.12s %-15.15s %s\n", $SNP, $count, $chan +ge; } } }
However when I run the above code I receive a "Use of uninitialised value ($count or $change) in printf at Script.txt line 33 error. This occurs at any part of the text file that contains a non-synonymous mutation.
This code works on another text file I have, and the only difference I can see is that in this example file, strain numbers have a format such as 7521_5#39=A, whereas in the file this code worked for they are written as 7521_5_39=A, i.e. the '#' is replaced with a second '_'.
The ideal output from this code would look like this-
Coordinates No of Strains AA Change 703 1 AA Ala->Thr 937 1 AA Val->Leu 1804 3 AA Ser->Thr
Any help would be much appreciated, but please be advised I am very new to perl and programming in general. This code is also not my own work - a code that I had written suffered the same error.
Many thanks in advance!
MB
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Parsing problem
by kcott (Archbishop) on Nov 07, 2012 at 23:18 UTC | |
by MB123 (Initiate) on Nov 07, 2012 at 23:49 UTC | |
by kcott (Archbishop) on Nov 08, 2012 at 00:40 UTC | |
by MB123 (Initiate) on Nov 08, 2012 at 10:48 UTC | |
by kcott (Archbishop) on Nov 08, 2012 at 19:00 UTC | |
| |
|
Re: Parsing problem
by frozenwithjoy (Priest) on Nov 07, 2012 at 23:45 UTC | |
|
Re: Parsing problem
by Anonymous Monk on Nov 07, 2012 at 22:17 UTC | |
by frozenwithjoy (Priest) on Nov 07, 2012 at 23:48 UTC | |
by Anonymous Monk on Nov 07, 2012 at 23:55 UTC | |
by MB123 (Initiate) on Nov 08, 2012 at 10:53 UTC | |
by Anonymous Monk on Nov 08, 2012 at 11:04 UTC | |
|