bio25 has asked for the wisdom of the Perl Monks concerning the following question:
Hi dear all! I have a problem with using the flip-flop. First I need to show you the text file (gff.file) which contains the data I want to work with:
NC_014171.1 RefSeq gene 14311 14425 . + . ID=NC_014171.1:rrs_1;locus_t +ag=BMB171_C5091;db_xref=GeneID:9190898 NC_014171.1 RefSeq exon 14311 14425 . + . ID=NC_014171.1:rrs_1:unknown +_transcript_1;Parent=NC_014171.1:rrs_1;gbkey=rRNA;locus_tag=BMB171_C5 +091 ;product=5S ribosomal RNA;db_xref=GeneID:9190898;exon_number=1 NC_014171.1 RefSeq gene 14459 15460 . - . locus_tag=BMB171_C0007;db_xr +ef=GeneID:9190899 NC_014171.1 RefSeq CDS 14462 15460 . - 0 locus_tag=BMB171_C0007;transl +_table=11;product=hypothetical protein;protein_id=YP_003662545.1;db_x +ref=GI:296500845;db_xref=GeneID:9190899;exon_number=1
The empty line above doesn't exist in the text file. I just wanted to show you which lines belong together. Now, here is my script:
# Task: Extract GeneID-Number and gene information #!/usr/bin/perl use strict; use warnings; my $in; my $data; my @array; my $array; my $GeneID; my @BMB; my $BMB; my $flag = 0; my %hash; my $hash; # 1) open the .gff Inputfile and while reading line by line split $dat +a at each tab and put them in the @array open $in, '<', "Genomteil.gff" or die $!; while ($data = <$in>) { @array = split(/\t/, $data); if ($array [2] =~/gene/){ #if you find the word 'gene' a textbloxk fol +lows which contains some information I want to extract and put in an +array) $flag = 1; @BMB = ($array[3], $array[4], $array[6]); #the array will be used as v +alues for my hash later } if ($array[2] =~/CDS/){ push (@BMB, $array[2]); #put more data in my array } elsif ($array[2] =~/exon/){ push (@BMB, $array[2]); } if ($array[8] =~ /.*;db_xref=GeneID:(\d+)\n/) { #if you find the word +'GeneID' extract the following number and put it in my hash (as key), + then put the array in my hash $GeneID = $1; @{$hash {$GeneID}} = @BMB; } if ($array [8]=~ /.*;exon_number=1/){ #if you find the word 'exon numb +er', then the textblock is over $flag = 0; } } close $in; while ( ($GeneID, $BMB) = each %hash) { print "$GeneID => $BMB[0]\n"; }
Okay, the script works but I do have more than one textblock I want to work with.For each time in the loop the new data which are putted in my array overwrite the data from the last time. So in the end, I only have information of the last textblock in my array. My supervisor told me what she wants me to do: I should think about something like a 'flag' which recognizes, that a new textblock appears which contains new data I want to store in the array. Interestingly I don't have a problem with the keys. I do have each key in my output (but each key has the same values). The problem is, I don't know how such a 'flag' could look like - so I don't know for what I have to search in literature etc. I hope you understand my problem, my english isn't the best. Has anybody an idea to help? Best wishes
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: problems with flip flop
by i5513 (Pilgrim) on Aug 17, 2011 at 08:27 UTC | |
by bio25 (Initiate) on Aug 17, 2011 at 10:36 UTC | |
|
Re: problems with flip flop
by Neighbour (Friar) on Aug 17, 2011 at 10:30 UTC | |
by bio25 (Initiate) on Aug 17, 2011 at 10:58 UTC | |
by Neighbour (Friar) on Aug 17, 2011 at 12:00 UTC | |
by bio25 (Initiate) on Aug 17, 2011 at 14:30 UTC | |
by Neighbour (Friar) on Aug 17, 2011 at 14:58 UTC | |
|