In that case, set the $flag-variable to 0 when you expect to read a 'gene'-line, and set it to 1 when you expect to read a 'CDS' or 'exon'-line.
You will need to keep the @BMB-variable declared before going into the while-loop, since you need to keep the data you put in there (when $flag was 0) to be still there when you're going through the while-loop with $flag being 1.
Note that the error-handling is not present in this version, but you can add that yourself.
This results in the following:
#!/usr/bin/perl
# Task: Extract GeneID-Number and gene information
use strict;
use warnings;
my $in;
my $data;
my @array;
my $array;
my $GeneID;
my @BMB;
my $flag = 0;
my %hash;
my $hash;
# 1) open the .gff Inputfile and while reading line by line split $dat
+a at each tab and put them in the @array
open $in, '<', "Genomteil.gff" or die $!;
while ($data = <$in>) {
@array = split (/\t/, $data);
if ($flag == 0) {
if ($array[2] =~ /gene/) { #if you find the word 'gene' a t
+extbloxk follows which contains some information I want to extract an
+d put in an array)
$flag = 1; # Set the flag. We will be expecting a 'CDS' or
+ 'exon'-line next
@BMB = ($array[3], $array[4], $array[6]); #the array wi
+ll be used as values for my hash later
} ## end if ($array[2] =~ /gene/)
if ($array[8] =~ /.*;db_xref=GeneID:(\d+)\n/) { #if you fin
+d the word 'GeneID' extract the following number and put it in my has
+h (as key), then put the array in my hash
$GeneID = $1;
} ## end if ($array[8] =~ /.*;db_xref=GeneID:(\d+)\n/)
} elsif ($flag == 1) {
if ($array[2] =~ /CDS/) {
push (@BMB, $array[2]);
#put more data in my array
} elsif ($array[2] =~ /exon/) {
push (@BMB, $array[2]);
}
@{$hash{$GeneID}} = @BMB;
$flag = 0; # Reset the flag. We will be expecting a 'gene'-lin
+e next
}
} ## end while ($data = <$in>)
close $in;
my $BMB;
while (($GeneID, $BMB) = each %hash) {
print "$GeneID => $BMB[0]\n";
}
| [reply] [d/l] |
hi, thanks for that. unfortunately I have an appointment now, so I can't try your script today. But I'm looking forward to run it tomorrow morning.
Thanks again, have a nice evening
| [reply] |
while (($GeneID, $BMB) = each %hash) {
print "$GeneID => $BMB[0]\n";}
I still get this output:
9190899 => 15576
9190898 => 15576
9191049 => 15576
9190897 => 15576
9191048 => 15576
all keys but the same values. I'm trying to figure it out!
Okay, I have no idea where the problem is!
| [reply] [d/l] [select] |