mulder4786 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks! I'm facing a conundrum that I cannot even begin to solve using my limited Perl knowledge.

I have multiple (~50) square correlation matrices that are very large (on the order of 1700 x 1700). A small portion of one might look like:

rs12272511 rs7107801 rs11027752 rs12421837 rs12272511 1.0 -0.023 -0.511 -0.046 rs7107801 -0.023 1.0 0.040 0.233 rs11027752 0.494 0.514 1.0 0.501 rs12421837 -0.039 -0.040 0.021 1.0
The matrices are overlapping for about 95% of the ~1700 row/column IDs. For those IDs that are missing in a matrix, I would like to print "0" placeholders. For example, given another matrix:
rs12272511 rs11027752 rs12421837 rs12272511 1.0 .844 .276 rs11027752 .267 1.0 -.980 rs12421837 -.876 .374 1.0
As this second matrix is missing the ID "rs7107801", I would like to add this ID in with a value of "0":
rs12272511 rs7107801 rs11027752 rs12421837 rs12272511 1.0 0 .844 .276 rs7107801 0 0 0 0 rs11027752 .267 0 1.0 -.980 rs12421837 -.876 0 .374 1.0
I will then hopefully be able to use the matrices with placeholders (all matrices will now have equal dimensions) in order to calculate weighted averages. The matrices are all space-delimited. Can someone point me in the right direction?
  • Comment on combining multiple matrices with placeholders if row and column values in one matrix do not exist in another
  • Select or Download Code

Replies are listed 'Best First'.
Re: combining multiple matrices with placeholders if row and column values in one matrix do not exist in another
by BrowserUk (Patriarch) on Jan 14, 2016 at 02:29 UTC

    This should get you started:

    #! perl -slw use strict; use Inline::Files; use Data::Dump qw[ pp ]; my @templateIds = split ' ', <TEMPLATE>; my %template = map{ my( $id, @vals )= split(); $id => \@vals } <TEMPLA +TE>; #pp \%template; my @missingIds = split ' ', <MISSING>; my %missing = map{ my( $id, @vals )= split(); $id => \@vals } <MISSING +>; #pp \%missing; for my $i ( 0 .. $#templateIds ) { if( $templateIds[ $i ] ne $missingIds[ $i ] ) { splice @missingIds, $i, 0, $templateIds[ $i ]; for my $k ( keys %missing ) { splice @{ $missing{ $k } }, $i, 0, 0; } $missing{ $templateIds[ $i ] } = [ (0) x @{ $template{ $templa +teIds[ $i ] } } ]; } } print "\t", join ' ', @missingIds; print join ' ', $_, @{ $missing{ $_ } } for @missingIds; __TEMPLATE__ rs12272511 rs7107801 rs11027752 rs12421837 rs12272511 1.0 -0.023 -0.511 -0.046 rs7107801 -0.023 1.0 0.040 0.233 rs11027752 0.494 0.514 1.0 0.501 rs12421837 -0.039 -0.040 0.021 1.0 __MISSING__ rs12272511 rs11027752 rs12421837 rs12272511 1.0 .844 .276 rs11027752 .267 1.0 -.980 rs12421837 -.876 .374 1.0

    Output:

    C:\test>1152731 rs12272511 rs7107801 rs11027752 rs12421837 rs12272511 1.0 0 .844 .276 rs7107801 0 0 0 0 rs11027752 .267 0 1.0 -.980 rs12421837 -.876 0 .374 1.0

    Note:I've used Inline::Files to produce a self-contained example; you'd need to open the real files and operate on them for your real work.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
      This works well, thank you!
Re: combining multiple matrices with placeholders if row and column values in one matrix do not exist in another
by hdb (Monsignor) on Jan 14, 2016 at 08:29 UTC

    From a statistics point of view (ignoring your Perl question alltogether), I would put 1.0 on the diagonal of the missing lines and columns to ensure that the augmented matrix is a correlation matrix again. This would also help to maintain the status as a correlation matrix when doing weighted averages across matrices.

    Generally, a lot of things can go wrong when manipulating large correlation matrices.

Re: combining multiple matrices with placeholders if row and column values in one matrix do not exist in another
by Anonymous Monk on Jan 14, 2016 at 03:48 UTC
    When people say "placeholders" I think "printf"... So, a variant. That reads the file with small matrix line by line:
    use strict; use warnings; my @template = qw( rs12272511 rs7107801 rs11027752 rs12421837 ); my $small_matrix = <<'END'; rs12272511 rs11027752 rs12421837 rs12272511 1.0 .844 .276 rs11027752 .267 1.0 -.980 rs12421837 -.876 .374 1.0 END open my $fh, '<', \$small_matrix; my %ids = map { $_ => 1 } split ' ', <$fh>; my $fmt_id_present = join ' ', map $ids{$_} ? '%s' : '0', @template; my $fmt_id_missing = join ' ', ('0') x @template; $_ = "%s $_\n" for $fmt_id_present, $fmt_id_missing; print "\t@template\n"; for (@template) { if ( $ids{$_} ) { printf $fmt_id_present, split ' ', <$fh>; } else { printf $fmt_id_missing, $_; } }
    if you find the line open my $fh, '<', \$small_matrix; strange: yes, you can open strings as files in Perl.
Re: combining multiple matrices with placeholders if row and column values in one matrix do not exist in another
by Anonymous Monk on Jan 14, 2016 at 02:10 UTC

    The matrices are all space-delimited. Can someone point me in the right direction?

    Sure, perlintro, split