Hi Perl Monks,

I am a beginner in perl programming. I have written a PERL program to count the number of bases in a DNA molecule from a text file in command prompt in Windows XP. The program works well with small text files having DNA sequence data and gives correct results.

But when I tried to count the number of bases from a 299 MB text file, the program shows “out of memory” in cmd after nearly 2 minutes. One Perl monk suggested me to use Tie: : File to solve this problem. I looked for the syntax of Tie : : File in internet but could not make out how to use it in my program with  <STDIN> input operator assigned to a scalar variable like $DNAfilename=<STDIN>;

I have given the perl program below. May I expect to get help from any perl monk to sort out this problem in my program either using Tie : : File or other method so that I can analyze a 299MB text file. I have a 2-GB RAM in my laptop with Active Perl 5.10.1 Build 1007 version.

Can I use a code like  %mem=300 MW; in perl because this type of code is used by some theoretical and physical chemists in specific programs based on c programming for “Quantitative Structure Analysis and Report (QSAR)” for biomolecules to solve memory problem?

#!usr/bin/perl print "\n\nPlease type the filename of the DNA sequence data: "; $DNAfilename=<STDIN>; chomp $DNAfilename; unless ( open(DNAFILE, $DNAfilename) ) { print "Cannot open file \"$DNAfilename\"\n\n"; exit; } while(@DNA= <DNAFILE>) { $DNA=join('',@DNA); close DNAFILE; # Remove whitespace $DNA=~ s/\s//g; # Remove whitespace $DNA=~ s/\s//g;# Line 15 # Count number of bases $b=length($DNA); print "\nNumber of bases: $b.\nDoes the value tally with GenBank recor +d? If yes,continue."; # Count number of each base and nonbase $A=0;$T=0;$G=0;$C=0;$e=0; # Line 20 while($DNA=~ /A/ig){$A++} while($DNA=~ /T/ig){$T++} while($DNA=~ /G/ig){$G++} while($DNA=~ /C/ig){$C++} while($DNA=~ /[^ATGC]/ig){$e++} # Line 25 print "\nA=$A; T=$T; G=$G; C=$C; Errors(N)=$e.\n."; } exit;

I have given the cmd output below.

Microsoft Windows XP Version 5.1.2600

(C) Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\user>cd d*

C:\Documents and Settings\user\Desktop>m.pl

Please type the filename of the DNA sequence data: manjur.txt

Out of memory!

C:\Documents and Settings\user\Desktop>

I am ever grateful to perl monks for their quick reply with suggestions.


In reply to Seeking help for using Tie : : File in my perl program for counting bases using Active Perl 5.10.1 Build 1007 by supriyoch_2008

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.