agustina_s has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perlmonks... I have some problem that may be trivial to most of you.. I have searched through the file input/output in the search.. and read them but doesn't seem to solve it. I have a very large file called me.db and each record in the file is separated by "\/\/\n" ( double/ and newline ). my file looks like this:
AB Naja atra (Chinese cobra). OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; OC Lepidosauria; Squamata; Scleroglossa; Serpentes; Colubroidea; OC Elapidae; Elapinae; Naja. OX NCBI_TaxID=8656; // DR Pfam; PF00087; toxin; 1. DR PRINTS; PR00282; CYTOTOXIN. DR ProDom; PD000206; Snake_toxin; 1. DR PROSITE; PS00272; SNAKE_TOXIN; 1. KW Venom; Cytotoxin; Cardiotoxin; Multigene family; 3D-structure; Si +gnal. FT SIGNAL 1 21 FT CHAIN 22 81 CYTOTOXIN 2. FT DISULFID 24 42 //
I give this file as an input when I compile it such as:
perl myprog.pl a.db
and I want to go to each record in the file and do something there.. But my code below.. they loop more than expected. For eq if I have 28 records separated my "\/\/\n" then they will loop until 70 times or more. This is my code:
$/ = "\/\/\n"; $counter=1; while (<>){ # Read the entry print "Entry $counter\n"; print "I don't know what to do \n"; #do something $counter++ }
Last time I also try this which is also doesn't work well:
my $infile =$ARGV[0]; open(IN, $infile) or die "Can't open input file: $!"; $/ = "\/\/\n"; $counter=1; while (<IN>){ # Read the entry print "Entry $counter\n"; print "Hallo to you....\n"; # do something $counter++ }
So what do I have to do? Thanks in advanced.

Replies are listed 'Best First'.
Re: hm strange loop for me..
by pjf (Curate) on Jan 24, 2002 at 09:30 UTC
    This shouldn't make a difference to how your loop works, but it does make it easier for humans to read. You can replace:

    $/ = "\/\/\n";

    with:

    $/ = "//\n";

    Besides from that, your code looks fine, and works perfectly for the sample file you have provided. I suspect what might be happening is somewhere in your file you have the sequence "//\n" at say the end of a line inside a record, when you only really want to catch "//" appearing on a line by itself. You might try $/ = "\n//\n" to force this.

    Be aware that you may end up reading an extra "empty" record at the end of your file if the last thing in your file is the input record separator.

    Cheers,

    Paul Fenwick
    Perl Training Australia

      Hi... thanks for the answer.. Actually the problem is the are a few blank lines at the end of the input file.. So after the last // there are a few blank lines which is impossible to be deleted for every case. eg:
      DR Pfam; PF00087; toxin; 1. DR PRINTS; PR00282; CYTOTOXIN. DR ProDom; PD000206; Snake_toxin; 1. DR PROSITE; PS00272; SNAKE_TOXIN; 1. KW Venom; Cytotoxin; Cardiotoxin; Multigene family; 3D-structure; Si +gnal. FT SIGNAL 1 21 FT CHAIN 22 81 CYTOTOXIN 2. FT DISULFID 24 42 // RA Jeyaseelan K., Armugam A., Lachumanan R., Tan C.H., Tan N.H.; RT "Six isoforms of cardiotoxin in malayan spitting cobra (Naja naja RT sputatrix) venom: cloning and characterization of cDNAs."; RL Biochim. Biophys. Acta 1380:209-222(1998). RN [6] RP SEQUENCE FROM N.A. RC SPECIES=N.n.kaouthia; TISSUE=Venom gland; RA Li K.J., Wei J.F., Jin Y., Lu Q.M., Xiong Y.L., Wang W.Y.; RT "Molecular cloning and sequence analysis of cDNA of cytotoxin ana +log RT of Naja kaouthia."; // ##there are a few blank lines here
      When I eliminated the blank line... things goes just as expected. Is there something to handle this kind of situation? Thanks a lot..
        It's easy enough to check for a blank record:

        $/ = "\n//\n"; # Or whatever is appropriate. while (<>) { chomp; # Removes the field separator. next if /^\s*$/; # Skip "space-only" records. # Do stuff here. }

        This means if there's a blank record at the end (or in the middle, for that matter) it will get skipped instead of processed.

        Cheers,

        Paul Fenwick
        Perl Training Australia

Re: hm strange loop for me..
by jarich (Curate) on Jan 24, 2002 at 09:42 UTC
    You're not doing anything incorrect. The code you've provided and your example work perfectly for me. Perhaps the bug is generated by the parts in your while loop that you've cut out for the moment.

    Obvious things to check

    • Run the first loop from your post over your test data and make sure that the results are the same as you're complaining about.
    • Double and triple check that your input file only has 28 records and not 28 records followed by some space followed by the same 28 records (it can happen).
    • I ran your loop, as is, except I added strict. I can't see that making a difference in this case, but it's a good habbit to get in to.
    No ideas if it still doesn't work.

    jarich

Re: hm strange loop for me..
by kschwab (Vicar) on Jan 24, 2002 at 09:31 UTC
    Looks like there are perhaps more "//\n" strings in the data than you think.

    If I create a data file by hand, your code works fine. Perhaps you should print out the records so you can see where it's breaking ?

    $/="//\n";# no need to escape / .. not that it matters open(FILE,$ARGV[0]) or die("Can't open $ARGV[0]: $!\n"); while (<FILE>) { chomp; # print the record between square brackets # and add an extra couple of newlines to make # it obvious where the record breaks are print "record #$. contains [$_]\n\n"; }