about where to check the flag

gdnew has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perlmonks. I have a database consists of records separated by // and \n.
I have a very terrible program consists of lots of flags. Now I have a problem about where to check one of them ( the siteflag). Well part of my input and output files looks like follow.

Input files
 
TITLE     An excitatory scorpion toxin with a distinctive feature: an
            additional alpha helix at the C terminus and its implicati
+ons 
            for interaction with insect sodium channels
             /interaction_site="Q8, N9, Y10, N11, C12, F17, W38, R58, 
+V59 and K62 form the putative bioactive
                surface in mature toxin (Zilberberg et al., 1997)."
             /channel="Sodium channel"
             /target_cell="Insect specific (Excitatory)"
             /c_end="Free"
//
TITLE     Cloning and Sequencing of an Excitatory Insect-Selective Neu
+rotoxin
            BmKIT cDNA from Buthus martensii Karsch
            /interaction_site="Sequential deletions of C-terminal resi
+dues suggested Ile73 and Ile74 for toxicity. {Oren et al., 1999}"
            /channel="Sodium channel"
            /c_end="Free"        
//

Output

References:TITLE         "An excitatory scorpion toxin with a distinct
+ive feature: an additional alpha helix at the C terminus and its impl
+ications  for interaction 
with insect sodium channels"  
Interaction_site  "Q8, N9, Y10, N11, C12, F17, W38, R58, V59 and K62 f
+orm the putative bioactive surface in mature toxin (Zilberberg et al.
+, 1997)."
Channel  "Sodium channel"
Target_cell      "Insect specific (Excitatory)"
C_end    "Free" 

References:TITLE         "Cloning and Sequencing of an Excitatory Inse
+ct-Selecti
ve Neurotoxin BmKIT cDNA from Buthus martensii Karsch"    
Interaction_site  "Sequential deletions of C-terminal residues suggest
+ed Ile73 and Ile74 for toxicity. {Oren et al., 1999}"
Channel  "Sodium channel"
C_end    "Free"
[download]

The title, interaction_site and c_end are fixed element( appear in every record). The rest are optional.
For every record in the file I check them line by line, modify some input and print it to the output file.
The title and interaction site may consist of nothing (" "), a line, or multiple line
Therefore I must use flag to keep track of the input.
The problem is, there a quite a lot of elements after the interaction site which are optional ( not consist in every record ).I include only two of them. My code looks like follow:

compile : perl  prog.pl  input.db  result
#! /usr/local/bin/perl -w

#initialize all the variable, initialize flags to 0 and line to ''

my $counter=1;
my $file1="$ARGV[0]";
my $result=">".$ARGV[1];
my $site='';
my $titleline='';
my $siteflag=0;
my $titleflag=0;

open(INFO1,$file1) or die "Can't open $file1.\n";   #open file1
open(OUT,$result) or die "Can't open $result.\n";   #open result  

#the input files has a separator :\r\n in each line
foreach(<INFO1>)
{                   

  if(/\s*TITLE\s*(.*)\r/){
    ######## check the title
        $titleflag=1;
        $titleline=$1;
    }                              
   elsif(/\s*\/interaction_site=(.*)\r/){
     ######## handle the title
        print OUT qq(References:TITLE\t "$titleline"\n);
        $titleflag=0;
        $titleline=''; 
     ########  check the site   
       $site=$1;
       $siteflag=1;
    }      
    elsif(/\s*(.*)\r/ && $titleflag==1){
        $titleline.=" "; # add a white space
        $titleline.=$1;  #concatenate the title with previous line
    }                      
                                           
    elsif(/\s*\/channel=(.*)\r/){
        if(check2($1)){
        print OUT "Channel\t $1\n";
        }
    }
    elsif(/\s*\/target_cell=(.*)\r/){
        if(check2($1)){
        print OUT "Target_cell\t $1\n";
        }
    }
    elsif(/\s*\/c_end=(.*)\r/){
    ######## handle interaction site
        $siteflag=0;
        $site='';
   
    ######## check c_end
        if(check2($1)){
            print OUT "C_end\t $1\n";
      }# end if
    }#end elsif
    
    ####elsif(/\s*(.*)\r && $siteflag==1){
    ####   $site.=" "; # add a white space
    ####    $site.=$1; #concatenatewith previous site
    ####    print "Site $site\n";
    ####    }    

} # end foreach

sub check2 { #check whether item = empty quotes
    if($1 =~ /" "/){
        return 0;}
    else{
        return 1;}
}
[download]

The last code preceded by #### is the one that need to be modified. If I use the code in that location it will only print the interaction site if there are more than one lines of site.
Where should I put the code in order I can print the interaction_site regardless they are consists of "" , a line or multiple line? Thanks so much...

Comment on about where to check the flag Select or Download Code

Replies are listed 'Best First'.

(crazyinsomniac) Re: about where to check the flag
by crazyinsomniac (Prior) on Feb 07, 2002 at 09:23 UTC

#!/usr/bin/perl -wT
use strict;
use CGI;
my %defaultRecord = ( interaction_site => undef,
                       TITLE => undef,
                       channel => undef,
                       target_cell =>undef,
                       c_end => undef,
                    ,);


my $blankRecord = new CGI(\%defaultRecord);

$blankRecord->param(-name => 'channel',
                    -value => 'Sodium channel',
                   ,);

open(SAVERECORDHERE,'>','savedrecord.dat') or die "crapola $!";

$blankRecord->save(SAVERECORDHERE);

close(SAVERECORDHERE);
[download]

SAVING THE STATE OF THE SCRIPT TO A FILE: $query->save(FILEHANDLE) This will write the current state of the form to the provided filehandle. You can read it back in by providing a filehandle to the new() method. Note that the filehandle can be a file, a pipe, or whatever!

The format of the saved file is:

        NAME1=VALUE1
        NAME1=VALUE1'
        NAME2=VALUE2
        NAME3=VALUE3
        =
[download]

   use CGI;
   open (OUT,">>test.out") || die;
   $records = 5;
   foreach (0..$records) {
       my $q = new CGI;
       $q->param(-name=>'counter',-value=>$_);
       $q->save(OUT);
   }
   close OUT;
   # reopen for reading
   open (IN,"test.out") || die;
   while (!eof(IN)) {
       my $q = new CGI(IN);
       print $q->param('counter'),"\n";
   }
[download]

Now you can concentrate on finishing your app, instead of parsing flat-files ... also, an alternative to the above CGI thingy might be to use windows ini style records, something like

[recordorsomething]
key = value
k0ey = valuee

[recordothersomething]
k = v
[download]

Config::INI

What I also like to do, as opposed to using a flat-file, is to add DB_File to the mix, which along with CGI.pm, makes for better than flatfile, and as always,makes for an easy to parse, quick to write with the security of familiarity, solution.

Happy Coding!

update:
It has been brought to my attention, that gdnew is using a very peculiar dataformat, sorta like:

COMMERCIAL SUPPLIERS
SEQUENCE            
                     /exon="1-120"
                     /intron=" "
//
[download]

strange quotes

Now my question is for you gdnew, where did you get the idea to use such a bizzare format?

______crazyinsomniac_____________________________
Of all the things I've lost, I miss my mind the most.
perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"

[reply]
[d/l]
[select]

Re: about where to check the flag
by dreadpiratepeter (Priest) on Feb 07, 2002 at 13:27 UTC

#!/usr/local/bin/perl

use strict;

# pull everything into a string
my $str = join("",<DATA>);

# dump the newlines
$str =~ s/\n/ /g;

# loop through the records (// delimited)
foreach (split(m!//!,$str)) {
  last unless /\S/;                       # skip that pesky last blank
+ record
  my ($title,@flags) = split(m!/!);       # break out the fields (/ de
+limited)
  $title =~ s/\s+/ /g;                    # kill extra whitespace
  print "References:$title\n";            # print the title
  foreach (@flags) {                      # loop through the fields
    my ($key,$value) = split(/=/);        # split into pairs
    $value =~ s/\s+/ /g;                  # kill extra whitespace
    print ucfirst($key)."\t$value\n";     # print each one
  }
  print "\n";
}

__DATA__
TITLE     An excitatory scorpion toxin with a distinctive feature: an
            additional alpha helix at the C terminus and its implicati
+ons
            for interaction with insect sodium channels
             /interaction_site="Q8, N9, Y10, N11, C12, F17, W38, R58, 
+V59 and K62 form the putative bioactive
                surface in mature toxin (Zilberberg et al., 1997)."
             /channel="Sodium channel"
             /target_cell="Insect specific (Excitatory)"
             /c_end="Free"
//
TITLE     Cloning and Sequencing of an Excitatory Insect-Selective Neu
+rotoxin
            BmKIT cDNA from Buthus martensii Karsch
            /interaction_site="Sequential deletions of C-terminal resi
+dues suggested Ile73 and Ile74 for toxicity. {Oren et al., 1999}"
            /channel="Sodium channel"
            /c_end="Free"
//
[download]

#!/usr/local/bin/perl

use strict;

my $str; # holds the records

#loop through the data
while (<DATA>) { 
  chomp;                                    # kill newlines
  if (m!//!) {                              # we have a record
    my ($title,@flags) = split(m!/!,$str);  # break out the fields(/ d
+elimited)
    $title =~ s/\s+/ /g;                    # kill extra whitespace
    print "References:$title\n";            # print the title
    foreach (@flags) {                      # loop through the fields
      my ($key,$value) = split(/=/);        # split into pairs
      $value =~ s/\s+/ /g;                  # kill extra whitespace
      print ucfirst($key)."\t$value\n";     # print each one
    }
    print "\n";
    $str = "";                              # zero the input buffer
  } else {
    $str .= " " . $_;                       # accumulate data
  }
}

__DATA__
TITLE     An excitatory scorpion toxin with a distinctive feature: an
            additional alpha helix at the C terminus and its implicati
+ons
            for interaction with insect sodium channels
             /interaction_site="Q8, N9, Y10, N11, C12, F17, W38, R58, 
+V59 and K62 form the putative bioactive
                surface in mature toxin (Zilberberg et al., 1997)."
             /channel="Sodium channel"
             /target_cell="Insect specific (Excitatory)"
             /c_end="Free"
//
TITLE     Cloning and Sequencing of an Excitatory Insect-Selective Neu
+rotoxin
            BmKIT cDNA from Buthus martensii Karsch
            /interaction_site="Sequential deletions of C-terminal resi
+dues suggested Ile73 and Ile74 for toxicity. {Oren et al., 1999}"
            /channel="Sodium channel"
            /c_end="Free"
//
[download]

Entropy is not what is used to be.

[reply]
[d/l]
[select]