Hi guys. So I have tried to make a database combining two files of data. One is accesionnumbersfull.txt

A0AQI4 A0AQI5 A0AQI7 .....

the other is this Pfam-A.seed

# STOCKHOLM 1.0 #=GF ID 1-cysPrx_C #=GF AC PF10417.4 #=GF DE C-terminal domain of 1-Cys peroxiredoxin #=GF AU Finn RD, Coggill PC #=GF SE Gene3D, pdb_1prx ... #=GS A3EU39_9BACT/160-195 AC A3EU39.1 #=GS Q7VQB3_BLOFL/159-194 AC Q7VQB3.1 #=GS Q057V5_BUCCC/160-195 AC Q057V5.1 #=GS A5CDZ8_ORITB/160-195 AC A5CDZ8.1 ...

So what i'm supposed to do is to match the numbers in the first file to the groups in the second. so the group name is after the #=GF AC. PFxxxxx Problem is the files are huge. the first file alone is 138mb. So i have memory issues. my code is as follow.

#!/usr/bin/perl use warnings; use strict; open OUTPUT, ">C:\\Users\\Jems\\Desktop\\Perl\\PFAMin.txt" or die $!; open ANUMBER, "C:\\Users\\Jems\\Desktop\\Perl\\AccessionNumbersfull.tx +t" or die $!; our @acnumbers; select OUTPUT; $|=1; foreach (<ANUMBER>){ chomp; push (@acnumbers, $_);} $/="\/\/"; our $acnumbers; our @list; foreach $acnumbers(@acnumbers){ open PFAMDB, "C:\\Users\\Jems\\Desktop\\Perl\\Pfam-A.seed" or die +$!; my $unit; foreach $unit(<PFAMDB>){ my @units= split /#/,$unit; my @pfx=grep(/=GF AC/,@units); foreach (@pfx){s/=GF AC/\x20/}; our $units; foreach $units(@units){ if ($units=~/.*AC $acnumbers/){ push (@list, @pfx); }else{next} } } print "$acnumbers is in:"; print "@list \n"; undef @list; }

anyway to streamline it?

another thing i needed to do is add the names corresponding to the numbers. those are in a seperate file, but the sequence is the same. i took the numbers out of that file. format:

>tr|A0FGZ9|A0FGZ9_9ARCH Methyl coenzyme M reductase (Fragment) OS=uncu +ltured archaeon GN=mcrA PE=4 SV=1 >tr|A0FH03|A0FH03_9ARCH Methyl coenzyme M reductase (Fragment) OS=uncu +ltured archaeon GN=mcrA PE=4 SV=1

but i don't know how to. any ideas? thanks!!

sorry but it's kinda urgent and i've been trying for ages!


In reply to Using less memory with BIG files by jemswira

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.