My input data file would look something like:
AFFX-BioB-5_at 20 20 200.2 P 0.001
AFFX-BioB-M_at 20 20 400.4 P 0.002
AFFX-BioB-3_at 20 20 200.5 P 0.003
I want the 4th column, with the signal data. Actually, that other subroutine gets only the first column from the first file. I didn't know how to do that otherwise, since everything else was in a loop. This way, I have everything set to look like (if it worked anyway):
AFFX-BioB-5_at 200.0 300.0 400.0
AFFX-BioB-M_at 200.0 300.0 400.0
AFFX-BioB-3_at 200.0 300.0 400.0
Thanks for your suggestions!!
Bioinformatics | [reply] [d/l] [select] |
OK, now we're talking. Assuming you want a result which is first the AFFX-BioB-xxx_at-identifier, followed by all signal-data which are connected to this identifier, I suggest: - you drop the get_targets sub and all references to
it.
- Then you change your sub get_signal to:
while (@filename) {
my $i;
$file=shift @filename;
use Cwd 'chdir';
chdir "./data";
open (FILE, "$file") or die;
while (<FILE>) {
next while $i++ <= 14;
(my $id, undef, undef, my $signal, undef, undef)=split(/\t
+/);
push @{$outputdata{$id}}, $signal;
}
close (FILE);
}
After having run this sub over all your files you will find in %outputdata a nicely ordered (per identifier) structure of your signal-data.
- "Printing this datastructure goes as follows:
for $id (keys %outputdata) {
print "$id:\t",(join("\t",@{$outputdata{$id}})),"\n";
}
Of course you can print it to a filehandle. This is a format which is suitable to be imported in a database or a spreadsheet.
The "magic" of using references to anonymous arrays may perhaps be a bit too deep for someone who is just starting to program, but if you read Chapters 8 and 9 of the Camel book a few times and study the examples given, much will become clearer. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] [d/l] [select] |