agustina_s has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perlmonks ..... I have a problem concerning a very big database consists of a lot of records which is separated by //and\n. The input database looks partly like:
DATE 13-JUN-2000 AUTHORS Oren,D.A., Froy,O., Amit,E., Kleinberger Doron,N., Gurevit +z,M. and Shaanan,B. ORIGIN 1 acaaaataaa gtgaacttct gaaatcagca cgataaaaag aaacgaaaat 51 ttaatgtgtc ttatcatctt cccaattatg ggagtgcttg gcaaaaagaa // DATE 13-JUN-2000 AUTHORS Froy O, Zilberberg N, Gordon D, Turkov M, Gilles N, Stank +iewicz M, Pelhate M, Loret E, Oren DA, Shaanan B, Gurevitz M. GenBank:AJ012313{%3858952&dopt=GenBank}, EMBL:AJ012313{%AJ012313}, DDB +J:AJ012313{%AJ012313}, SwissProt:P56637{%P56637} ORIGIN 1 aaaataaagt gaacttctga aatcagcacg ataaaaagaa //
I want to do some changes to the input and then print it to the output files. I have some problem with the input separator. At first I set it to "//\n" but then inside the for loop I want to examine for every line in the prog. So do I have to set it back to \n? And I have some trouble with the while loop. My code looks like:
compile : perl prog.pl input.db result #!/usr/bin/perl my $input = $ARGV[0]; my $output = ">" . $ARGV[1]; my $counter=1; my $no='D000001'; open(INPUT, $input) or die "Can't open $input."; open(OUTPUT, $output) or die "Can't open $output."; $/="//\n"; while (<INPUT>) { print "DBACC\t $no\n"; if ($_=~/^DATE\s*(.*-.*-.*)\s*\n/){ print "DATE\t $1\n";} elsif ($_=/^GenBank:(.*),\sEMBL:(.*),\sDDBJ:(.*)},\sSwissProt:(.*) +\n/){ print "ACCESSION:GenBank\t ($1)\n"; print "ACCESSION:EMBL\t ($2)\n"; print "ACCESSION:DDBJ\t ($3)\n"; print "ACCESSION:SwissProt\t ($4)\n"; } print "Entry $counter\n"; $counter++; $no++; } close (INPUT); close (OUTPUT);
But the program only loop once and only print the DBACC not even the date. When I set the input separator back into \n in the loop it also didn't give many changes. Where do I have to put the input separator so that for each record I can modify the line and print it back to the output?

Thanks in advanced. Sincerely

Ti2xn

Replies are listed 'Best First'.
Re: about separator
by Trimbach (Curate) on Feb 01, 2002 at 02:27 UTC
    Setting the input record separator to "//\n" is a pretty good way to do what you want, but you don't need to muck around changing it in the while loop. In this case "split" is your friend, as in:
    while (<INPUT>) { my @lines = split "\n"; # and boom, @lines has each line in your record # ready for your parsing pleasure. .... }

    Gary Blackburn
    Trained Killer

Re: about separator
by chromatic (Archbishop) on Feb 01, 2002 at 02:29 UTC
    split each record on \n to get individual lines.

    You might also want to open your output file for writing.

Re: about separator
by Anonymous Monk on Feb 01, 2002 at 02:35 UTC
    Did you want to use /m ?