I'd like to request help from a Monk on how to build a complex data structure, based on a file's contents. I am new to this, and despite having studied various tutorials on this topic, I cannot figure out the correct procedure.
I am trying to put the data from a file into a Hash of Arrays of Arrays, where the individuals (who each have their own line in the file) are the hash keys, and have the data (consisting of comma-separated pairs of zeros and ones) associated with them broken up into 100 kilobase (kb) windows, based on the genomic coordinates that are in the header line of each file. Each 100kb segment will therefore have its own array within a larger array that encompasses the entire data for that individual.
The code I have so far is listed below, followed by a sample of one of the files I need to work with. In this example, the first several interior arrays would be empty (we start with number 162, which all of the data in the few columns listed would go into), but for other files this would not be the case. For example, for each individual (beginning on line 2 of the input file) I want all data for the columns corresponding to header line coordinates between 1-99,999 (if any) to go into the first array, then 100,000-199,999 into the second array, and so on.
Help on this would be most appreciated.
#!/usr/bin/perl use warnings; use strict; use v5.14; die "need two arguments (i.e. chr cont) at invocation" unless @ARGV == + 2; chomp( my $chr_num = shift ); chomp( my $cont = shift ); open my $out_file, ">", "chr${chr_num}_exome_snps_processed_${cont}_ST +ATS" or die "Can't open output file: $!\n"; # Get a list of individuals (will be hash keys later): open my $in_file, "<", "chr${chr_num}_exome_snps_processed_$cont" or die "Can't open input file: $!\n"; my @individuals; my %data; while (<$in_file>) { chomp; my @snp_bins; if (/^SAMPLE/) { my ( $placeholder, @coords ) = split /,/; foreach my $coord (@coords) { push @snp_bins, int( $coord / 100_000 ); } } else { my ( $id, @snps ) = split /,/; push @individuals, $id; foreach my $individual (@individuals) { foreach my $snp (@snps) { $data{$individual}[ [ shift @snp_bins ] ] = $snp; } } } } close $in_file;
## Sample of data file. Each file has hundreds of thousands of columns + and hundreds of rows SAMPLE,16287215,16287226,16287365,16287649,16287784,16287851,16287912 HG00553,0 0,0 0,0 0,0 0,0 0,0 0,0 0 HG00554,0 0,0 0,0 0,0 0,0 0,0 0,0 0 HG00637,0 0,0 0,0 0,0 0,0 0,0 0,0 0 HG00638,0 0,0 0,0 0,0 0,0 0,1 1,0 0 HG00640,0 0,0 0,0 0,0 0,0 0,1 1,0 0
In reply to Population of HoAoA based on file contents by iangibson
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |