in reply to Creating a binary matrix
G'day Anupam,
This does what you asked for:
#!/usr/bin/env perl use strict; use warnings; use autodie; use List::Util qw{max}; use Tie::File; my $master_list = 'pm_1079097_List'; my @files = qw{pm_1079097_File1 pm_1079097_File2 pm_1079097_File3}; tie my @list, 'Tie::File', $master_list; my %matrix = map { $_ => {} } @list; untie @list; for my $filename (@files) { tie my @file, 'Tie::File', $filename; for (@file) { $matrix{$_}{$filename} = 1 if exists $matrix{$_}; } untie @file; } my $format = '%' . max(map length, keys %matrix) . 's' . ((' %' . max(map length, @files) . 's') x @files) . "\n" +; printf $format, '', @files; for my $gene (sort keys %matrix) { printf $format, $gene, map { $_ || 0 } @{$matrix{$gene}}{@files}; }
Output:
pm_1079097_File1 pm_1079097_File2 pm_1079097_File3 Gene1 1 0 0 Gene2 1 1 0 Gene3 1 1 1 Gene4 0 1 1 Gene5 0 0 1 Gene6 0 0 0
I prefixed the filenames you gave with 'pm_1079097_', replace those with your real filenames. The input data I used should match your sample data: it's shown in the spoiler below.
All code I used is documented in http://perldoc.perl.org/perl.html. If you don't understand something, that you should be the first place you look.
If you have follow-up questions, ensure they're accompanied by code, data, error messages, and so on (marked up appropriately, e.g. within <code>...</code> tags) as described in "How do I post a question effectively?".
Input data:
$ cat pm_1079097_List Gene1 Gene2 Gene3 Gene4 Gene5 Gene6
$ cat pm_1079097_File1 Gene1 Gene2 Gene3
$ cat pm_1079097_File2 Gene2 Gene3 Gene4
$ cat pm_1079097_File3 Gene3 Gene4 Gene5
-- Ken
|
|---|