Hi Perlmonks. I have a database consists of records separated by // and \n.
I have a very terrible program consists of lots of flags. Now I have a problem about where to check one of them ( the siteflag). Well part of my input and output files looks like follow.
Input files TITLE An excitatory scorpion toxin with a distinctive feature: an additional alpha helix at the C terminus and its implicati +ons for interaction with insect sodium channels /interaction_site="Q8, N9, Y10, N11, C12, F17, W38, R58, +V59 and K62 form the putative bioactive surface in mature toxin (Zilberberg et al., 1997)." /channel="Sodium channel" /target_cell="Insect specific (Excitatory)" /c_end="Free" // TITLE Cloning and Sequencing of an Excitatory Insect-Selective Neu +rotoxin BmKIT cDNA from Buthus martensii Karsch /interaction_site="Sequential deletions of C-terminal resi +dues suggested Ile73 and Ile74 for toxicity. {Oren et al., 1999}" /channel="Sodium channel" /c_end="Free" // Output References:TITLE "An excitatory scorpion toxin with a distinct +ive feature: an additional alpha helix at the C terminus and its impl +ications for interaction with insect sodium channels" Interaction_site "Q8, N9, Y10, N11, C12, F17, W38, R58, V59 and K62 f +orm the putative bioactive surface in mature toxin (Zilberberg et al. +, 1997)." Channel "Sodium channel" Target_cell "Insect specific (Excitatory)" C_end "Free" References:TITLE "Cloning and Sequencing of an Excitatory Inse +ct-Selecti ve Neurotoxin BmKIT cDNA from Buthus martensii Karsch" Interaction_site "Sequential deletions of C-terminal residues suggest +ed Ile73 and Ile74 for toxicity. {Oren et al., 1999}" Channel "Sodium channel" C_end "Free"
The title, interaction_site and c_end are fixed element( appear in every record). The rest are optional.
For every record in the file I check them line by line, modify some input and print it to the output file.
The title and interaction site may consist of nothing (" "), a line, or multiple line
Therefore I must use flag to keep track of the input.
The problem is, there a quite a lot of elements after the interaction site which are optional ( not consist in every record ).I include only two of them. My code looks like follow:
compile : perl prog.pl input.db result #! /usr/local/bin/perl -w #initialize all the variable, initialize flags to 0 and line to '' my $counter=1; my $file1="$ARGV[0]"; my $result=">".$ARGV[1]; my $site=''; my $titleline=''; my $siteflag=0; my $titleflag=0; open(INFO1,$file1) or die "Can't open $file1.\n"; #open file1 open(OUT,$result) or die "Can't open $result.\n"; #open result #the input files has a separator :\r\n in each line foreach(<INFO1>) { if(/\s*TITLE\s*(.*)\r/){ ######## check the title $titleflag=1; $titleline=$1; } elsif(/\s*\/interaction_site=(.*)\r/){ ######## handle the title print OUT qq(References:TITLE\t "$titleline"\n); $titleflag=0; $titleline=''; ######## check the site $site=$1; $siteflag=1; } elsif(/\s*(.*)\r/ && $titleflag==1){ $titleline.=" "; # add a white space $titleline.=$1; #concatenate the title with previous line } elsif(/\s*\/channel=(.*)\r/){ if(check2($1)){ print OUT "Channel\t $1\n"; } } elsif(/\s*\/target_cell=(.*)\r/){ if(check2($1)){ print OUT "Target_cell\t $1\n"; } } elsif(/\s*\/c_end=(.*)\r/){ ######## handle interaction site $siteflag=0; $site=''; ######## check c_end if(check2($1)){ print OUT "C_end\t $1\n"; }# end if }#end elsif ####elsif(/\s*(.*)\r && $siteflag==1){ #### $site.=" "; # add a white space #### $site.=$1; #concatenatewith previous site #### print "Site $site\n"; #### } } # end foreach sub check2 { #check whether item = empty quotes if($1 =~ /" "/){ return 0;} else{ return 1;} }
The last code preceded by #### is the one that need to be modified. If I use the code in that location it will only print the interaction site if there are more than one lines of site.
Where should I put the code in order I can print the interaction_site regardless they are consists of "" , a line or multiple line? Thanks so much...

In reply to about where to check the flag by gdnew

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.