comment on

Hi guys. So I have tried to make a database combining two files of data. One is accesionnumbersfull.txt

A0AQI4
A0AQI5
A0AQI7
.....
[download]

the other is this Pfam-A.seed

# STOCKHOLM 1.0
#=GF ID   1-cysPrx_C
#=GF AC   PF10417.4
#=GF DE   C-terminal domain of 1-Cys peroxiredoxin
#=GF AU   Finn RD, Coggill PC
#=GF SE   Gene3D, pdb_1prx
...
#=GS A3EU39_9BACT/160-195  AC A3EU39.1
#=GS Q7VQB3_BLOFL/159-194  AC Q7VQB3.1
#=GS Q057V5_BUCCC/160-195  AC Q057V5.1
#=GS A5CDZ8_ORITB/160-195  AC A5CDZ8.1
...
[download]

So what i'm supposed to do is to match the numbers in the first file to the groups in the second. so the group name is after the #=GF AC. PFxxxxx Problem is the files are huge. the first file alone is 138mb. So i have memory issues. my code is as follow.

#!/usr/bin/perl
use warnings;
use strict;
open OUTPUT, ">C:\\Users\\Jems\\Desktop\\Perl\\PFAMin.txt" or die $!;
open ANUMBER, "C:\\Users\\Jems\\Desktop\\Perl\\AccessionNumbersfull.tx
+t" or die $!;
our @acnumbers;
select OUTPUT;
$|=1;
foreach (<ANUMBER>){
    chomp;
    push (@acnumbers, $_);}
$/="\/\/";

our $acnumbers;
our @list;

foreach $acnumbers(@acnumbers){
    
    open PFAMDB, "C:\\Users\\Jems\\Desktop\\Perl\\Pfam-A.seed" or die 
+$!;
    my $unit;
    foreach $unit(<PFAMDB>){
        my @units= split /#/,$unit;
        my @pfx=grep(/=GF AC/,@units);
        foreach (@pfx){s/=GF AC/\x20/};
        our $units;
        foreach $units(@units){
            if ($units=~/.*AC $acnumbers/){
            push (@list, @pfx);
            }else{next}
        }
    
    }
    
    print "$acnumbers is in:";
    print "@list \n";
    undef @list;
}
[download]

anyway to streamline it?

another thing i needed to do is add the names corresponding to the numbers. those are in a seperate file, but the sequence is the same. i took the numbers out of that file. format:

>tr|A0FGZ9|A0FGZ9_9ARCH Methyl coenzyme M reductase (Fragment) OS=uncu
+ltured archaeon GN=mcrA PE=4 SV=1
>tr|A0FH03|A0FH03_9ARCH Methyl coenzyme M reductase (Fragment) OS=uncu
+ltured archaeon GN=mcrA PE=4 SV=1
[download]

but i don't know how to. any ideas? thanks!!

sorry but it's kinda urgent and i've been trying for ages!

In reply to Using less memory with BIG files by jemswira

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.