Hi guys. So I have tried to make a database combining two files of data. One is accesionnumbersfull.txt
A0AQI4 A0AQI5 A0AQI7 .....
the other is this Pfam-A.seed
# STOCKHOLM 1.0 #=GF ID 1-cysPrx_C #=GF AC PF10417.4 #=GF DE C-terminal domain of 1-Cys peroxiredoxin #=GF AU Finn RD, Coggill PC #=GF SE Gene3D, pdb_1prx ... #=GS A3EU39_9BACT/160-195 AC A3EU39.1 #=GS Q7VQB3_BLOFL/159-194 AC Q7VQB3.1 #=GS Q057V5_BUCCC/160-195 AC Q057V5.1 #=GS A5CDZ8_ORITB/160-195 AC A5CDZ8.1 ...
So what i'm supposed to do is to match the numbers in the first file to the groups in the second. so the group name is after the #=GF AC. PFxxxxx Problem is the files are huge. the first file alone is 138mb. So i have memory issues. my code is as follow.
#!/usr/bin/perl use warnings; use strict; open OUTPUT, ">C:\\Users\\Jems\\Desktop\\Perl\\PFAMin.txt" or die $!; open ANUMBER, "C:\\Users\\Jems\\Desktop\\Perl\\AccessionNumbersfull.tx +t" or die $!; our @acnumbers; select OUTPUT; $|=1; foreach (<ANUMBER>){ chomp; push (@acnumbers, $_);} $/="\/\/"; our $acnumbers; our @list; foreach $acnumbers(@acnumbers){ open PFAMDB, "C:\\Users\\Jems\\Desktop\\Perl\\Pfam-A.seed" or die +$!; my $unit; foreach $unit(<PFAMDB>){ my @units= split /#/,$unit; my @pfx=grep(/=GF AC/,@units); foreach (@pfx){s/=GF AC/\x20/}; our $units; foreach $units(@units){ if ($units=~/.*AC $acnumbers/){ push (@list, @pfx); }else{next} } } print "$acnumbers is in:"; print "@list \n"; undef @list; }
anyway to streamline it?
another thing i needed to do is add the names corresponding to the numbers. those are in a seperate file, but the sequence is the same. i took the numbers out of that file. format:
>tr|A0FGZ9|A0FGZ9_9ARCH Methyl coenzyme M reductase (Fragment) OS=uncu +ltured archaeon GN=mcrA PE=4 SV=1 >tr|A0FH03|A0FH03_9ARCH Methyl coenzyme M reductase (Fragment) OS=uncu +ltured archaeon GN=mcrA PE=4 SV=1
but i don't know how to. any ideas? thanks!!
sorry but it's kinda urgent and i've been trying for ages!
In reply to Using less memory with BIG files by jemswira
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |