in reply to Population of HoAoA based on file contents

Okay, I think I've made some progress, but I'm still not quite there yet. Here's what I now have:

#!/usr/bin/perl use warnings; use strict; use v5.14; use Getopt::Long; use Bio::PopGen::IO; use Bio::PopGen::Statistics; die "need two arguments (i.e. chr cont) at invocation" unless @ARGV == + 2; chomp( my $chr_num = shift ); chomp( my $cont = shift ); open my $out_file, ">", "chr${chr_num}_exome_snps_processed_${cont}_ST +ATS" or die "Can't open output file: $!\n"; open my $in_file, "<", "chr${chr_num}_exome_snps_processed_$cont" or die "Can't open input file: $!\n"; my %data; my @snp_bins; my @individuals; my @all_snps; while (<$in_file>) { chomp; if (/^SAMPLE/) { my ( $placeholder, @coords ) = split /,/; foreach my $coord (@coords) { push @snp_bins, int( $coord / 100_000 ); } } else { my ( $id, @snps ) = split /,/; push @individuals, $id; push @all_snps[$. - 2], join(',', @snps); } } foreach my $individual (@individuals) { foreach my $index ( 0 .. $#snp_bins ) { push( @{ $data{$individual}[ $snp_bins[$index] ] }, $all_snps[ +$index] ); } } close $in_file;

But there's still (at least) a problem with the line

push @all_snps[$. - 2], join(',', @snps);

I hope I'm otherwise headed in the right direction..?

In regard to what I will do with undefined bins: I will iterate through all the bins, and any that don't have a minimum number of elements simply won't be passed as data to the bioperl popgen stats methods, later on in the program.

Replies are listed 'Best First'.
Re^2: Population of HoAoA based on file contents
by state-o-dis-array (Hermit) on May 15, 2012 at 14:46 UTC
    It seems to me that this would do what you are trying to accomplish.
    while (<$in_file>) { chomp; if (/^SAMPLE/) { my ( $placeholder, @coords ) = split /,/; foreach my $coord (@coords) { push @snp_bins, int( $coord / 100_000 ); } } else { my ( $id, @snps ) = split /,/; #need to check here that $#snps == $#snp_bins ? foreach my $index ( 0 .. $#snp_bins ) { push( @{ $data{$id}[ $snp_bins[$index] ] }, $snps[$index] ); } } }
    Unless you need them elsewhere, I don't see a reason in your code snippent to store @individuals and @all_snps. The "need to check" comment above is based on my attempt to understand your last statement above. If it's possible that @snps might not have an entry for each @snp_bins, then you'll want this check.