Re: retrieving in the correct order
by halley (Prior) on Dec 16, 2004 at 19:38 UTC
|
See my response in an old thread: Re: sort an array according to another array
Also, you're initializing @array2 with qq() which makes a string, not a list of values. You want either qw( list of words ) or ('word', 'word', 'word') or (value, value, value) without the qq. If you think of your identifiers as words, then you should use lt/eq/gt instead of </==/> when comparing them, too.
-- [ e d @ h a l l e y . c c ]
| [reply] [d/l] [select] |
Re: retrieving in the correct order
by VSarkiss (Monsignor) on Dec 16, 2004 at 19:27 UTC
|
I take it the first array is relatively small? If so, you can just turn your loops inside out:
for my $i (@array2) {
for my $line (@array1) {
my $key = "gi|$i|";
if (substr($line, 0, length $key) eq $key) {
print $line;
}
}
}
Notice how you don't even need a regex, just a simple string compare.
If array1 is sorted, you can speed this up a little by remembering where you left off.
| [reply] [d/l] [select] |
|
|
Hi VSarkiss,
Thanks for your solution but I can't get it to work! Maybe it's becuase my first array is quite big (~1000 sequences). But wouldn't this just slow it down?
Thanks
| [reply] |
|
|
Well, some detail on what went wrong would help....
When I tried it against the sample data in your original post, I noticed two things:
- You're testing for gi| at the beginning of the line, but your @array1 values start with >gi|. I had to remove the >; you'll have to either fix the $key = line to match your data, or fix the data to match your test...
- You're populating a single element in array2. If you want each number to be an element of the array, you need to use qw(...), not qq(...).
If these are both copy-and-paste artifacts, pleave provide more detail on what the error is.
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
hey, check my code below, i tested it with a file with 3000 lines in it. time cat gen.txt | perl -w gen.pl says:
real 0m0.139s
user 0m0.109s
sys 0m0.000s
if you're still looking for an answer...
--
to ask a question is a moment of shame
to remain ignorant is a lifelong shame
| [reply] [d/l] [select] |
Re: retrieving in the correct order
by Animator (Hermit) on Dec 16, 2004 at 19:14 UTC
|
A possible way is to build a hash of the first array, where the key is the id of the element.
If that's done you can easily use a hash slice to get an array with the values in the order of the second array.
| [reply] |
Re: retrieving in the correct order
by nedals (Deacon) on Dec 16, 2004 at 19:52 UTC
|
# If the files are not too large...
# Read in the sequence file putting the data into a hash
use strict;
my %hash;
while (<DATA>) {
chomp $_;
my ($id,$protein) = /^gi\|(.+?)\|.+\|(.+)$/; ## Save what you nee
+d
$hash{$1} = $2;
}
# Now use the second file to print out the hash
my @array2 = qw(13470319 13470331 15460001 13490216);
map { print "$hash{$_}\n"; } @array2;
__DATA__
gi|13490216|ref|NP_101899.1|protein for 216
gi|13470331|ref|NP_101896.1|protein for 331
gi|15460001|ref|NP_101898.1|protein for 001
gi|13470319|ref|NP_101897.1|protein for 319
| [reply] [d/l] |
|
|
| [reply] [d/l] |
|
|
Another thing (just noticed it now) why would you be using map?? map returns an array (filled with x times 1 (return value of print)), which you aren't using at all...
What should be used is for/foreach (or a hash slice ofc).
| [reply] |
|
|
The map method was already sitting in my 'test' template. The foreach method is a better option, but I liked your hash slice method even better. ++
| [reply] |
Re: retrieving in the correct order
by insaniac (Friar) on Dec 16, 2004 at 20:54 UTC
|
or, if the first array is really a text file, say on a UNIX/LINUX system, you could cat the first file and read it line by line.. no?
the scanning perl program gen.pl:
---------------------------------
#!/usr/bin/perl
use strict;
my @array = qw(13470319 13470331 15460001 13490216);
my @array2;
while(my $line = <> ) {
foreach my $id (0..$#array) {
$array2[$id]=$line if $line =~ m/^gi\|($array[$id])\|/;
}
}
print "order: ", join (" ", @array), "\n";
map {print} @array2;
-------------------------------
the text file:
gi|13470331|ref|NP_101896.1| hypothetical protein
MFWVTKKALMPFLMLPAGIIFVSAVGYAINWLFSTLFQFQPPLVEGPAGPVTVLIFTITMLLAYDISYYL
gi|13470319|ref|NP_101897.1| hypothetical protein
MGAYCQAHPACKVTDRTVIGRRDAAMNAPFVLAIPRTRTFEVVTSAARLAEIAPAWTALWQRAGGLVFQH
-------------------------------
the execution:
# cat gen.txt | perl gen.pl
order: 13470319 13470331 15460001 13490216
gi|13470319|ref|NP_101897.1| hypothetical protein
gi|13470331|ref|NP_101896.1| hypothetical protein
this just looked like a quick and keep it simple job to me..
UPDATE: updated the code to display correct order..
--
to ask a question is a moment of shame
to remain ignorant is a lifelong shame
| [reply] [d/l] |
|
|
I think you are missing the point...
If I understand the poster correctly then he (or she) wants to output them in the order they appear in the second array, so in this case first line/element 13470319, after that line/element 13470331 and so on
| [reply] |
|
|
| [reply] |