drlecb has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, I really need some help please! I'm trying to merge 3 genetic data (.var) files containing 2 parent and offpring samples in a way that I can obtain whether they are heterozygous/homozygous for a particular allele. Dummy data for the three samples are supplied below.

I'm nearly there, but I believe my issue is when I am returning values in my hash %zygos which I want to give me either a "HET" or "HOM". When I put in the key "$variant" I return an array coordinate, not the HET/HOM value I'm looking for. However when I use Data Dumper I get out the value I want. I think if I can fix this, the script will work.... but I'm just not that clever. I've tried all day to get my script to work and now I'm in need of some kind help please... I am not an efficient coder, as you will see.

Please be gentle. Best wishes LECB

#!/usr/bin/perl; use strict; use v5.8.8; use Data::Dumper; ##List all .var files in directory my $path = '.'; #Reads same directory as running .pl file opendir(DIR, $path); my @files = grep { /\.var$/ } readdir(DIR); #Reads names of .var files + into @files @files = sort @files; my @samples; foreach (@files) { if ($_ =~ /\_/) { $_ =~ /(^.+?)\_/; push @samples, $1; } else { $_ =~ /(^.+)\.var/; push @samples, $1; } } closedir(DIR); @files = sort @files; my $file_count = @files; print "\nRunning on $file_count files:\n"; print "@files "; #Make unique list of variants observed in all opened .var files my @variants; ##Create concatenated list of all variant locations and +alleles my @zygosity; my $zyg; my %zygos; foreach my $file (@files) { open VAR, $file; chomp (my @var = <VAR>); ##sub in $file for ID (3rd column) while ($var[0] =~ s/ID/$file/) { shift @var; } foreach (@var) { my @fields = split /\t/, $_; $fields[2] = $file; my $locus = join "\t", @fields[0,1,2,3,4,5]; push @variants, $locus; #pull out genotype $fields[9] =~ /(^\d\/\d\:)/; if ($1 =~ /0\/1\:/) { $zyg = "HET"; } else { $zyg = "HOM"; } $fields[9] = $zyg; # push @zygosity, $zyg; push @{$zygos{$locus}}, $zyg; } } close VAR; my %variants_hash = map { $_, 1 } @variants; ##Generates the unique li +st my @variants_list = keys %variants_hash; %variants_hash = undef; @variants_list = sort @variants_list; #List of just loci my %variants_geno; #List of loci to which genotypes will be appended my %variants_info; # List of variant information to append after genot +ypes for printing. my $sum_variants = @variants_list; print "\nRunning on $sum_variants non-redundant variants.\n"; # Pull out genotypes for all of the non-redundant variants for each in +dividual my @variant_geno; #Pointless array for anonymous assignation of arrays + in hash foreach my $file (@files) { open VAR, $file; my @var = <VAR>; foreach my $variant (@variants_list) { my $found = 0; my $genotype = 'REF?'; foreach my $line (@var) { my @fields2 = split /\t/, $line; $fields2[2] = $file; $fields2[9] = $zygos{$variant}; my $newline = join "\t", @fields2[0..123]; $line = $newline; if ($line =~ /$variant/) { if ($found == 0) { $variants_info{$variant} = (join "\t", @fields2[10 +..123]); $found ++; } if ($fields2[9] eq 'HET') { $genotype = 'HET'; } if ($fields2[9] eq 'HOM') { $genotype = 'HOM'; } last; }} if ($variants_geno{$variant} =~ /./) { push @{$variants_geno{$variant}}, $genotype; } else { @{$variants_geno{$variant}} = [@variant_geno]; push @{$variants_geno{$variant}}, $genotype; }} close VAR; } open OUT, ">$ARGV[0]" . 'merged.vars'; my $samples_list = join "\t", @samples; print OUT "CHROM POS ID REF ALT QUAL $samples_list + Chr Start End Ref Alt Func.refGene Gene.refGene + GeneDetail.refGene ExonicFunc.refGene AAChange.refGene Fun +c.knownGene Gene.knownGene GeneDetail.knownGene ExonicFunc.k +nownGene AAChange.knownGene avsnp144 1000g2015aug_all 100 +0g2015aug_afr 1000g2015aug_amr 1000g2015aug_sas 1000g2015aug +_eur 1000g2015aug_eas esp6500siv2_all esp6500siv2_ea esp6 +500siv2_aa ExAC_ALL ExAC_AFR ExAC_AMR ExAC_EAS ExAC_FI +N ExAC_NFE ExAC_OTH ExAC_SAS cosmic70 SIFT_score SI +FT_pred Polyphen2_HDIV_score Polyphen2_HDIV_pred Polyphen2_H +VAR_score Polyphen2_HVAR_pred LRT_score LRT_pred Mutation +Taster_score MutationTaster_pred MutationAssessor_score Muta +tionAssessor_pred FATHMM_score FATHMM_pred PROVEAN_score +PROVEAN_pred VEST3_score CADD_raw CADD_phred DANN_score + fathmm-MKL_coding_score fathmm-MKL_coding_pred MetaSVM_score + MetaSVM_pred MetaLR_score MetaLR_pred integrated_fitCons_ +score integrated_confidence_value GERP++_RS phyloP7way_verte +brate phyloP20way_mammalian phastCons7way_vertebrate phastCo +ns20way_mammalian SiPhy_29way_logOdds Interpro_domain dbscSN +V_ADA_SCORE dbscSNV_RF_SCORE CLINSIG CLNDBN CLNACC CLN +DSDB CLNDSDBID HRC_AF HRC_AC HRC_AN HRC_non1000G_AF + HRC_non1000G_AC HRC_non1000G_AN Kaviar_AF Kaviar_AC Kavi +ar_AN nci60 KEY3 INFO_AC INFO_AF INFO_BaseQRankSum +INFO_ClippingRankSum INFO_DP INFO_DS INFO_END INFO_Excess +Het INFO_FS INFO_HaplotypeScore INFO_InbreedingCoeff INFO +_MLEAC INFO_MLEAF INFO_MQ INFO_MQRankSum INFO_QD INFO_ +RAW_MQ INFO_ReadPosRankSum INFO_SOR seq seq_flag AAcha +nge Grantham Mutability FuentesFalsePositive ACMG"; foreach my $variant_ident (@variants_list) { shift @{$variants_geno{$variant_ident}}; my $genotypes = join "\t", @{$variants_geno{$variant_ident}}; my $joined = join "\t", $variant_ident, $genotypes, $variants_info +{$variant_ident}; $joined =~ s/^\s+//; chop $joined; print OUT "$joined"; } close OUT;
--Dummydata1.var-- #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT + IFS003-W Chr Start End Ref Alt Func.refGene G +ene.refGene GeneDetail.refGene ExonicFunc.refGene AAChange.r +efGene Func.knownGene Gene.knownGene GeneDetail.knownGene + ExonicFunc.knownGene AAChange.knownGene avsnp144 1000g2015a +ug_all 1000g2015aug_afr 1000g2015aug_amr 1000g2015aug_sas + 1000g2015aug_eur 1000g2015aug_eas esp6500siv2_all esp6500si +v2_ea esp6500siv2_aa ExAC_ALL ExAC_AFR ExAC_AMR ExAC_E +AS ExAC_FIN ExAC_NFE ExAC_OTH ExAC_SAS cosmic70 SIF +T_score SIFT_pred Polyphen2_HDIV_score Polyphen2_HDIV_pred + Polyphen2_HVAR_score Polyphen2_HVAR_pred LRT_score LRT_pre +d MutationTaster_score MutationTaster_pred MutationAssessor_ +score MutationAssessor_pred FATHMM_score FATHMM_pred PROV +EAN_score PROVEAN_pred VEST3_score CADD_raw CADD_phred + DANN_score fathmm-MKL_coding_score fathmm-MKL_coding_pred M +etaSVM_score MetaSVM_pred MetaLR_score MetaLR_pred integr +ated_fitCons_score integrated_confidence_value GERP++_RS phy +loP7way_vertebrate phyloP20way_mammalian phastCons7way_vertebra +te phastCons20way_mammalian SiPhy_29way_logOdds Interpro_dom +ain dbscSNV_ADA_SCORE dbscSNV_RF_SCORE CLINSIG CLNDBN +CLNACC CLNDSDB CLNDSDBID HRC_AF HRC_AC HRC_AN HRC_n +on1000G_AF HRC_non1000G_AC HRC_non1000G_AN Kaviar_AF Kavi +ar_AC Kaviar_AN nci60 KEY3 INFO_AC INFO_AF INFO_Bas +eQRankSum INFO_ClippingRankSum INFO_DP INFO_DS INFO_END + INFO_ExcessHet INFO_FS INFO_HaplotypeScore INFO_Inbreeding +Coeff INFO_MLEAC INFO_MLEAF INFO_MQ INFO_MQRankSum INF +O_QD INFO_RAW_MQ INFO_ReadPosRankSum INFO_SOR seq seq_ +flag AAchange Grantham Mutability FuentesFalsePositive + ACMG chr1 14653 . C T 191.77 . AC=1;AF=0.500;AN=2;Base +QRankSum=1.23;ClippingRankSum=0.00;DP=26;ExcessHet=3.0103;FS=6.641;ML +EAC=1;MLEAF=0.500;MQ=32.19;MQRankSum=0.580;QD=7.38;ReadPosRankSum=-1. +830e-01;SOR=2.280 GT:AD:DP:GQ:PL 0/1:15,11:26:99:220,0,292 c +hr1 14653 14653 C T ncRNA_exonic WASH7P + intergenic NONE,MIR6859-3 dist=NONE;dist=2716 rs +62635297 + + + +0.0011526 30 26028 chr1:14653:14653:C:T 1 0.5 1 +.23 0 26 no_DS no_END 3.0103 6.641 no_HaplotypeS +core no_InbreedingCoeff 1 0.5 32.19 0.58 7.38 no +_RAW_MQ -1.83E-01 2.28 AAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTC +CATGTCAGAGCAA[C]GGCCCAAGTCTGGGTCTGGGGGGGAAGGTGTCATGGAGCCCCCTACGATT + pass XX . 0 1 0 chr1 16949 . A C 240.77 . AC=1;AF=0.500;AN=2;Base +QRankSum=1.57;ClippingRankSum=0.00;DP=39;ExcessHet=3.0103;FS=1.264;ML +EAC=1;MLEAF=0.500;MQ=21.03;MQRankSum=-7.380e-01;QD=6.17;ReadPosRankSu +m=1.62;SOR=0.693 GT:AD:DP:GQ:PL 0/1:24,15:39:99:269,0,502 ch +r1 16949 16949 A C ncRNA_exonic WASH7P + downstream MIR6859-3 rs199745162 0.0139776 + 0.0227 0.0144 0.0143 0.0159 + + + + 0.0159444 415 26028 chr1:16949:16949:A:C 1 + 0.5 1.57 0 39 no_DS no_END 3.0103 1.264 no_ +HaplotypeScore no_InbreedingCoeff 1 0.5 21.03 -7.38E-0 +1 6.17 no_RAW_MQ 1.62 0.693 CTGGAATGGTGCCAGGGGCAGAGGGG +GCAATGCCGGGGCCCAGGTCGGCA[A]TGTACATGAGGTCGTTGGCAATGCCGGGCAGGTCAGGCAGGT +AGGATGGA pass XX . 0 1 0 chr1 17020 . G A 58.77 . AC=1;AF=0.500;AN=2;BaseQ +RankSum=-4.670e-01;ClippingRankSum=0.00;DP=43;ExcessHet=3.0103;FS=23. +602;MLEAC=1;MLEAF=0.500;MQ=21.00;MQRankSum=0.00;QD=1.37;ReadPosRankSu +m=-1.126e+00;SOR=1.132 GT:AD:DP:GQ:PL 0/1:34,9:43:87:87,0,789 + chr1 17020 17020 G A ncRNA_exonic WASH7P + downstream MIR6859-3 rs199740902 + + + + 0.0141002 367 +26028 chr1:17020:17020:G:A 1 0.5 -4.67E-01 0 43 + no_DS no_END 3.0103 23.602 no_HaplotypeScore no_In +breedingCoeff 1 0.5 21 0 1.37 no_RAW_MQ -1.13E+0 +0 1.132 ATGCCGGGCAGGTCAGGCAGGTAGGATGGAACATCAATCTCAGGCACCTG[G]CC +CAGGTCTGGCACATAGAAGTAGTTCTCTGGGACCTGCAAGATTAGGCA pass XX . + 0 1 0 chr1 17385 . G A 81.77 . AC=1;AF=0.500;AN=2;BaseQ +RankSum=0.653;ClippingRankSum=0.00;DP=71;ExcessHet=3.0103;FS=3.091;ML +EAC=1;MLEAF=0.500;MQ=48.65;MQRankSum=0.417;QD=1.15;ReadPosRankSum=1.0 +2;SOR=0.311 GT:AD:DP:GQ:PL 0/1:60,11:71:99:110,0,1715 chr1 + 17385 17385 G A ncRNA_exonic;splicing MIR6859-1,MIR6 +859-2,MIR6859-3,MIR6859-4;WASH7P NR_024540:exon6:c.588-17C>T + ncRNA_exonic MIR6859-3 rs201535981 + 0.2454 0.2331 0.2857 0.0423 +0.2817 0.2743 0.2321 0.2092 + + + 0.0015912 246 154602 chr1:1 +7385:17385:G:A 1 0.5 0.653 0 71 no_DS no_END +3.0103 3.091 no_HaplotypeScore no_InbreedingCoeff 1 0. +5 48.65 0.417 1.15 no_RAW_MQ 1.02 0.311 AGCCAGGG +GGTCCAGGAAGACATACTTCTTCTACCTACAGAGGCGACATG[G]GGGTCAGGCAAGCTGACACCCGCT +GTCCTGAGCCCATGTTCCTCTCCCAC pass XX . 0 0 0 chr1 17408 . C G 134.77 . AC=1;AF=0.500;AN=2;Base +QRankSum=-9.480e-01;ClippingRankSum=0.00;DP=94;ExcessHet=3.0103;FS=0. +000;MLEAC=1;MLEAF=0.500;MQ=52.27;MQRankSum=-8.570e-01;QD=1.45;ReadPos +RankSum=0.359;SOR=0.527 GT:AD:DP:GQ:PL 0/1:79,14:93:99:163,0,23 +39 chr1 17408 17408 C G ncRNA_exonic MIR6859-1,M +IR6859-2,MIR6859-3,MIR6859-4 ncRNA_exonic MIR6859-3 + rs747093451 0. +0352 0.0339 0.049 0 0.0273 0.0438 0 0.0074 + + + 0.0013842 +214 154602 chr1:17408:17408:C:G 1 0.5 -9.48E-01 + 0 94 no_DS no_END 3.0103 0 no_HaplotypeScore no +_InbreedingCoeff 1 0.5 52.27 -8.57E-01 1.45 no_RAW_ +MQ 0.359 0.527 ACTTCTTCTACCTACAGAGGCGACATGGGGGTCAGGCAAGCTGAC +ACCCG[C]TGTCCTGAGCCCATGTTCCTCTCCCACATCATCAGGGGCACAGCGTGCAC pass + XX . 0 0 0 chr1 17697 . G C 125.77 . AC=1;AF=0.500;AN=2;Base +QRankSum=-2.529e+00;ClippingRankSum=0.00;DP=28;ExcessHet=3.0103;FS=0. +000;MLEAC=1;MLEAF=0.500;MQ=39.25;MQRankSum=-8.310e-01;QD=4.49;ReadPos +RankSum=1.07;SOR=0.124 GT:AD:DP:GQ:PL 0/1:21,7:28:99:154,0,714 + chr1 17697 17697 G C ncRNA_exonic WASH7P + upstream MIR6859-3 rs374995955 + + + + 0.0155602 405 2 +6028 chr1:17697:17697:G:C 1 0.5 -2.53E+00 0 28 + no_DS no_END 3.0103 0 no_HaplotypeScore no_Inbreedi +ngCoeff 1 0.5 39.25 -8.31E-01 4.49 no_RAW_MQ 1.0 +7 0.124 GCTGATGTTGCTGGGAAGACCCCCAAGTCCCTCTTCTGCATCGTCCTCGG[G]CT +CCGGCTTGGTGCTCACGCACACAGGAAAGTCCTTCAGCTTCTCCTGAG pass XX . + 0 1 0
--Dummydata2.var-- #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT + IFS001_W1515808 Chr Start End Ref Alt Func.refGe +ne Gene.refGene GeneDetail.refGene ExonicFunc.refGene AAC +hange.refGene Func.knownGene Gene.knownGene GeneDetail.known +Gene ExonicFunc.knownGene AAChange.knownGene avsnp144 100 +0g2015aug_all 1000g2015aug_afr 1000g2015aug_amr 1000g2015aug +_sas 1000g2015aug_eur 1000g2015aug_eas esp6500siv2_all es +p6500siv2_ea esp6500siv2_aa ExAC_ALL ExAC_AFR ExAC_AMR + ExAC_EAS ExAC_FIN ExAC_NFE ExAC_OTH ExAC_SAS cosmic70 + SIFT_score SIFT_pred Polyphen2_HDIV_score Polyphen2_HDIV +_pred Polyphen2_HVAR_score Polyphen2_HVAR_pred LRT_score +LRT_pred MutationTaster_score MutationTaster_pred MutationAs +sessor_score MutationAssessor_pred FATHMM_score FATHMM_pred + PROVEAN_score PROVEAN_pred VEST3_score CADD_raw CADD_p +hred DANN_score fathmm-MKL_coding_score fathmm-MKL_coding_pr +ed MetaSVM_score MetaSVM_pred MetaLR_score MetaLR_pred + integrated_fitCons_score integrated_confidence_value GERP++_RS + phyloP7way_vertebrate phyloP20way_mammalian phastCons7way_v +ertebrate phastCons20way_mammalian SiPhy_29way_logOdds Inter +pro_domain dbscSNV_ADA_SCORE dbscSNV_RF_SCORE CLINSIG CLN +DBN CLNACC CLNDSDB CLNDSDBID HRC_AF HRC_AC HRC_AN + HRC_non1000G_AF HRC_non1000G_AC HRC_non1000G_AN Kaviar_AF + Kaviar_AC Kaviar_AN nci60 KEY3 INFO_AC INFO_AF I +NFO_BaseQRankSum INFO_ClippingRankSum INFO_DP INFO_DS INF +O_END INFO_ExcessHet INFO_FS INFO_HaplotypeScore INFO_Inb +reedingCoeff INFO_MLEAC INFO_MLEAF INFO_MQ INFO_MQRankSum + INFO_QD INFO_RAW_MQ INFO_ReadPosRankSum INFO_SOR seq + seq_flag AAchange Grantham Mutability FuentesFalsePosi +tive ACMG chr1 14464 . A T 39.77 . AC=1;AF=0.500;AN=2;BaseQ +RankSum=1.38;ClippingRankSum=0.00;DP=4;ExcessHet=3.0103;FS=0.000;MLEA +C=1;MLEAF=0.500;MQ=40.51;MQRankSum=-6.740e-01;QD=9.94;ReadPosRankSum= +0.674;SOR=2.303 GT:AD:DP:GQ:PL 0/1:2,2:4:64:68,0,64 chr1 +14464 14464 A T ncRNA_exonic WASH7P int +ergenic NONE,MIR6859-3 dist=NONE;dist=2905 rs5461694 +44 0.0958466 0.0144 0.1138 0.1943 0.1859 0.005 + + + + 0.0346166 901 26028 chr1 +:14464:14464:A:T 1 0.5 1.38 0 4 no_DS no_END +3.0103 0 no_HaplotypeScore no_InbreedingCoeff 1 0.5 + 40.51 -6.74E-01 9.94 no_RAW_MQ 0.674 2.303 GTTCTTT +ATTGATTGGTGTGCCGTTTTCTCTGGAAGCCTCTTAAGAACAC[A]GTGGCGCAGGCTGGGTGGAGCCG +TCCCCCCATGGAGCACAGGCAGACAGA pass XX . 0 1 0 chr1 17407 . G A 140.77 . AC=1;AF=0.500;AN=2;Base +QRankSum=2.44;ClippingRankSum=0.00;DP=60;ExcessHet=3.0103;FS=0.000;ML +EAC=1;MLEAF=0.500;MQ=49.25;MQRankSum=1.74;QD=2.39;ReadPosRankSum=-1.5 +00e-02;SOR=0.608 GT:AD:DP:GQ:PL 0/1:49,10:59:99:169,0,1261 c +hr1 17407 17407 G A ncRNA_exonic MIR6859-1,MIR6859- +2,MIR6859-3,MIR6859-4 ncRNA_exonic MIR6859-3 + rs372841554 0.0765 + 0.0303 0.054 0.0672 0.0273 0.0894 0.0714 0.1352 + + + 0.0030013 + 464 154602 chr1:17407:17407:G:A 1 0.5 2.44 0 + 60 no_DS no_END 3.0103 0 no_HaplotypeScore no_I +nbreedingCoeff 1 0.5 49.25 1.74 2.39 no_RAW_MQ - +1.50E-02 0.608 TACTTCTTCTACCTACAGAGGCGACATGGGGGTCAGGCAAGCTGACAC +CC[G]CTGTCCTGAGCCCATGTTCCTCTCCCACATCATCAGGGGCACAGCGTGCA pass XX + . 0 0 0 chr1 17758 . T A 75.77 . AC=1;AF=0.500;AN=2;BaseQ +RankSum=0.674;ClippingRankSum=0.00;DP=5;ExcessHet=3.0103;FS=0.000;MLE +AC=1;MLEAF=0.500;MQ=38.87;MQRankSum=-5.240e-01;QD=15.15;ReadPosRankSu +m=-5.240e-01;SOR=1.022 GT:AD:DP:GQ:PL 0/1:2,3:5:63:104,0,63 +chr1 17758 17758 T A ncRNA_splicing WASH7P NR_02 +4540:exon5:c.451-16A>T upstream MIR6859-3 + + + + + chr1:17758:17758:T:A 1 0.5 0.674 0 5 no_D +S no_END 3.0103 0 no_HaplotypeScore no_InbreedingCoeff + 1 0.5 38.87 -5.24E-01 15.15 no_RAW_MQ -5.24E-01 + 1.022 GTGCTCACGCACACAGGAAAGTCCTTCAGCTTCTCCTGAGAGGGCCAGGA[T]GGC +CAAGGGATGGTGAATATTTGGTGCTGGGCCTAATCAGCTGCCATCCC pass XX . + 0 1 0 chr1 69511 . A G 2186.77 . AC=2;AF=1.00;AN=2;DP=7 +8;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=43.68;QD=28.04;SOR= +0.744 GT:AD:DP:GQ:PL 1/1:0,78:78:99:2215,234,0 chr1 69511 + 69511 A G exonic OR4F5 nonsynonymous SNV OR +4F5:NM_001005484:exon1:c.A421G:p.T141A exonic OR4F5 nons +ynonymous SNV OR4F5:uc001aal.1:exon1:c.A421G:p.T141A rs2691305 + 0.7598 0.8874 0.5441 0.9394 0. +5942 0.9507 0.9994 0.9907 0.9716 0.9597 0.9832 + 0.652 T 0 B 0 B 0 N 1 P -1.295 N + 1.26 T 1.54 N 0.012 -0.784 0.047 0.454 0.0 +03 N -0.997 T 0 T 0.487 0 1.15 -0.016 - +0.132 0.055 0.765 4.198 GPCR, rhodopsin-like, 7TM + 0.54161 83734 1 +54602 0.51 chr1:69511:69511:A:G 2 1 no_BaseQRankSum + no_ClippingRankSum 78 no_DS no_END 3.0103 0 no_Hap +lotypeScore no_InbreedingCoeff 2 1 43.68 no_MQRankSum + 28.04 no_RAW_MQ no_ReadPosRankSum 0.744 ACTACACTACAATT +ATGTGTGGCAACGCATGTGTCGGCATTATGGCTGTC[A]CATGGGGAATTGGCTTTCTCCATTCGGTGA +GCCAGTTGGCGTTTGCCGTG pass TA . 0 1 0 chr1 138041 . G A 119.77 . AC=1;AF=0.500;AN=2;Bas +eQRankSum=1.97;ClippingRankSum=0.00;DP=19;ExcessHet=3.0103;FS=8.903;M +LEAC=1;MLEAF=0.500;MQ=30.73;MQRankSum=-1.007e+00;QD=6.30;ReadPosRankS +um=1.18;SOR=0.493 GT:AD:DP:GQ:PL 0/1:12,7:19:99:148,0,289 ch +r1 138041 138041 G A ncRNA_exonic LOC729737 + intergenic RP11-34P13.7,RP11-34P13.14 dist=4318;dist=17 +49 rs560358882 0.00239617 0.0029 0.0051 0. +005 + + + 0.0015368 40 26028 + chr1:138041:138041:G:A 1 0.5 1.97 0 19 no_DS + no_END 3.0103 8.903 no_HaplotypeScore no_InbreedingCo +eff 1 0.5 30.73 -1.01E+00 6.3 no_RAW_MQ 1.18 +0.493 GAAGGGAAAAACTGGGCCTGGAAAGGCCGTTGTCAGGAATGAGCCCCATG[G]GCCTGAA +GAGGCCACTGGCAGGCGGGAGCTGGGCCTGCCGAAGCGGCCGA pass XX . 0 + 0 0 chr1 138156 . G T 120.84 . AC=2;AF=1.00;AN=2;DP=6 +;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=20.17;QD=20.14;SOR=0 +.693 GT:AD:DP:GQ:PL 1/1:0,6:6:18:149,18,0 chr1 138156 +138156 G T ncRNA_exonic LOC729737 intergen +ic RP11-34P13.7,RP11-34P13.14 dist=4433;dist=1634 rs +370691115 + + + + 0.0967804 2519 26028 chr1:138156:138156:G:T 2 1 + no_BaseQRankSum no_ClippingRankSum 6 no_DS no_END 3. +0103 0 no_HaplotypeScore no_InbreedingCoeff 2 1 20. +17 no_MQRankSum 20.14 no_RAW_MQ no_ReadPosRankSum 0.69 +3 GGACTCGGGAGGCCGCAGTGAAGCAACAGCTAGCTGGGCGTGGAGAGTCC[G]CTGTGAGGCAG +AGGCTGGGCCTGTGCAGGCCTTCGGGAGGCAGGAGGCTG pass XX . 0 0 + 0 chr1 139213 . A G 1774.77 . AC=1;AF=0.500;AN=2;Ba +seQRankSum=-4.140e+00;ClippingRankSum=0.00;DP=97;ExcessHet=3.0103;FS= +3.809;MLEAC=1;MLEAF=0.500;MQ=39.41;MQRankSum=-1.653e+00;QD=18.30;Read +PosRankSum=-7.760e-01;SOR=1.060 GT:AD:DP:GQ:PGT:PID:PL 0/1:51,4 +6:97:99:0|1:139213_A_G:1803,0,3192 chr1 139213 139213 A + G ncRNA_exonic LOC729737 downstream RP11-34 +P13.14 rs370723703 + 0.2506 0.3787 0.2879 0.3404 0.25 0.2597 0.3083 + 0.2352 + + + 0.0259699 4015 154602 chr1:139213:139213:A:G 1 0 +.5 -4.14E+00 0 97 no_DS no_END 3.0103 3.809 n +o_HaplotypeScore no_InbreedingCoeff 1 0.5 39.41 -1.65E ++00 18.3 no_RAW_MQ -7.76E-01 1.06 GGAAGGTTGCCATGAGACAA +AAGTTGGGCCTGGAAAGGCCCTTGTGAAGC[A]TGAGCTTGGCCTAAAGAGGCCACTGGGTGGCAGGAG +CTGGGTGTGTAGAA pass XX . 0 0 0 chr1 139233 . C A 1883.77 . AC=1;AF=0.500;AN=2;Ba +seQRankSum=-9.990e-01;ClippingRankSum=0.00;DP=110;ExcessHet=3.0103;FS +=2.463;MLEAC=1;MLEAF=0.500;MQ=39.80;MQRankSum=0.636;QD=17.13;ReadPosR +ankSum=-1.834e+00;SOR=0.595 GT:AD:DP:GQ:PGT:PID:PL 0/1:56,54:11 +0:99:0|1:139213_A_G:1912,0,3261 chr1 139233 139233 C A + ncRNA_exonic LOC729737 downstream RP11-34P13 +.14 rs373847457 + 0.2478 0.3763 0.2985 0.3548 . 0.2577 0.3103 0.2 +308 + + 0.02 +54654 3937 154602 chr1:139233:139233:C:A 1 0.5 +-9.99E-01 0 110 no_DS no_END 3.0103 2.463 no_Hap +lotypeScore no_InbreedingCoeff 1 0.5 39.8 0.636 17. +13 no_RAW_MQ -1.83E+00 0.595 AAGTTGGGCCTGGAAAGGCCCTTGTGAA +GCATGAGCTTGGCCTAAAGAGG[C]CACTGGGTGGCAGGAGCTGGGTGTGTAGAAGCTGCTGAAAGGTT +GGGAGC pass XX . 0 0 0
--Dummydata3.var-- #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT + IFS002_W1600378 Chr Start End Ref Alt Func.refGe +ne Gene.refGene GeneDetail.refGene ExonicFunc.refGene AAC +hange.refGene Func.knownGene Gene.knownGene GeneDetail.known +Gene ExonicFunc.knownGene AAChange.knownGene avsnp144 100 +0g2015aug_all 1000g2015aug_afr 1000g2015aug_amr 1000g2015aug +_sas 1000g2015aug_eur 1000g2015aug_eas esp6500siv2_all es +p6500siv2_ea esp6500siv2_aa ExAC_ALL ExAC_AFR ExAC_AMR + ExAC_EAS ExAC_FIN ExAC_NFE ExAC_OTH ExAC_SAS cosmic70 + SIFT_score SIFT_pred Polyphen2_HDIV_score Polyphen2_HDIV +_pred Polyphen2_HVAR_score Polyphen2_HVAR_pred LRT_score +LRT_pred MutationTaster_score MutationTaster_pred MutationAs +sessor_score MutationAssessor_pred FATHMM_score FATHMM_pred + PROVEAN_score PROVEAN_pred VEST3_score CADD_raw CADD_p +hred DANN_score fathmm-MKL_coding_score fathmm-MKL_coding_pr +ed MetaSVM_score MetaSVM_pred MetaLR_score MetaLR_pred + integrated_fitCons_score integrated_confidence_value GERP++_RS + phyloP7way_vertebrate phyloP20way_mammalian phastCons7way_v +ertebrate phastCons20way_mammalian SiPhy_29way_logOdds Inter +pro_domain dbscSNV_ADA_SCORE dbscSNV_RF_SCORE CLINSIG CLN +DBN CLNACC CLNDSDB CLNDSDBID HRC_AF HRC_AC HRC_AN + HRC_non1000G_AF HRC_non1000G_AC HRC_non1000G_AN Kaviar_AF + Kaviar_AC Kaviar_AN nci60 KEY3 INFO_AC INFO_AF I +NFO_BaseQRankSum INFO_ClippingRankSum INFO_DP INFO_DS INF +O_END INFO_ExcessHet INFO_FS INFO_HaplotypeScore INFO_Inb +reedingCoeff INFO_MLEAC INFO_MLEAF INFO_MQ INFO_MQRankSum + INFO_QD INFO_RAW_MQ INFO_ReadPosRankSum INFO_SOR seq + seq_flag AAchange Grantham Mutability FuentesFalsePosi +tive ACMG chr1 14590 . G A 118.77 . AC=1;AF=0.500;AN=2;Base +QRankSum=2.43;ClippingRankSum=0.00;DP=11;ExcessHet=3.0103;FS=3.090;ML +EAC=1;MLEAF=0.500;MQ=28.67;MQRankSum=-1.513e+00;QD=10.80;ReadPosRankS +um=0.357;SOR=1.981 GT:AD:DP:GQ:PGT:PID:PL 0/1:7,4:11:99:0|1:145 +90_G_A:147,0,307 chr1 14590 14590 G A ncRNA_exonic + WASH7P intergenic NONE,MIR6859-3 dist=NONE;di +st=2779 + + + + chr1:14590:14590:G:A 1 0.5 2.43 + 0 11 no_DS no_END 3.0103 3.09 no_HaplotypeScore + no_InbreedingCoeff 1 0.5 28.67 -1.51E+00 10.8 no_ +RAW_MQ 0.357 1.981 CAAGCCAGCCTTCCGCTCCTTGAAGCTGGTCTCCACACAGT +GCTGGTTCC[G]TCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCC pas +s XX . 0 1 0 chr1 14599 . T A 115.77 . AC=1;AF=0.500;AN=2;Base +QRankSum=0.253;ClippingRankSum=0.00;DP=12;ExcessHet=3.0103;FS=2.881;M +LEAC=1;MLEAF=0.500;MQ=30.13;MQRankSum=-1.712e+00;QD=9.65;ReadPosRankS +um=1.16;SOR=1.721 GT:AD:DP:GQ:PGT:PID:PL 0/1:8,4:12:99:0|1:1459 +0_G_A:144,0,349 chr1 14599 14599 T A ncRNA_exonic + WASH7P intergenic NONE,MIR6859-3 dist=NONE;dis +t=2770 rs531646671 0.147564 0.121 0.1758 0.209 +6 0.161 0.0893 + + + 0.0283925 + 739 26028 chr1:14599:14599:T:A 1 0.5 0.253 0 + 12 no_DS no_END 3.0103 2.881 no_HaplotypeScore no +_InbreedingCoeff 1 0.5 30.13 -1.71E+00 9.65 no_RAW_ +MQ 1.16 1.721 CTTCCGCTCCTTGAAGCTGGTCTCCACACAGTGCTGGTTCCGTCAC +CCCC[T]CCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAG homopoly +mer XX . 0 1 0 chr1 14604 . A G 151.77 . AC=1;AF=0.500;AN=2;Base +QRankSum=1.02;ClippingRankSum=0.00;DP=12;ExcessHet=3.0103;FS=2.881;ML +EAC=1;MLEAF=0.500;MQ=30.13;MQRankSum=-1.712e+00;QD=12.65;ReadPosRankS +um=1.34;SOR=1.721 GT:AD:DP:GQ:PGT:PID:PL 0/1:8,4:12:99:0|1:1459 +0_G_A:180,0,346 chr1 14604 14604 A G ncRNA_exonic + WASH7P intergenic NONE,MIR6859-3 dist=NONE;dis +t=2765 rs541940975 0.147564 0.121 0.1758 0.209 +6 0.161 0.0893 + + + 0.0285846 + 744 26028 chr1:14604:14604:A:G 1 0.5 1.02 0 + 12 no_DS no_END 3.0103 2.881 no_HaplotypeScore no_ +InbreedingCoeff 1 0.5 30.13 -1.71E+00 12.65 no_RAW_ +MQ 1.34 1.721 GCTCCTTGAAGCTGGTCTCCACACAGTGCTGGTTCCGTCACCCCCT +CCCA[A]GGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACG pass +XX . 0 1 0 chr1 14610 . T C 151.77 . AC=1;AF=0.500;AN=2;Base +QRankSum=3.16;ClippingRankSum=0.00;DP=13;ExcessHet=3.0103;FS=3.123;ML +EAC=1;MLEAF=0.500;MQ=31.00;MQRankSum=-1.442e+00;QD=11.67;ReadPosRankS +um=0.860;SOR=2.030 GT:AD:DP:GQ:PGT:PID:PL 0/1:8,5:13:99:0|1:145 +90_G_A:180,0,346 chr1 14610 14610 T C ncRNA_exonic + WASH7P intergenic NONE,MIR6859-3 dist=NONE;di +st=2759 + + + + 0.0001921 5 26028 chr1:14610:14610:T:C 1 + 0.5 3.16 0 13 no_DS no_END 3.0103 3.123 no_ +HaplotypeScore no_InbreedingCoeff 1 0.5 31 -1.44E+00 + 11.67 no_RAW_MQ 0.86 2.03 TGAAGCTGGTCTCCACACAGTGCTGGTTC +CGTCACCCCCTCCCAAGGAAG[T]AGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGG +CCCAA pass XX . 0 1 0 chr1 14653 . C T 254.77 . AC=1;AF=0.500;AN=2;Base +QRankSum=1.91;ClippingRankSum=0.00;DP=25;ExcessHet=3.0103;FS=7.952;ML +EAC=1;MLEAF=0.500;MQ=30.99;MQRankSum=-1.236e+00;QD=10.19;ReadPosRankS +um=0.792;SOR=2.642 GT:AD:DP:GQ:PL 0/1:12,13:25:99:283,0,238 +chr1 14653 14653 C T ncRNA_exonic WASH7P + intergenic NONE,MIR6859-3 dist=NONE;dist=2716 r +s62635297 + + + + 0.0011526 30 26028 chr1:14653:14653:C:T 1 0.5 +1.91 0 25 no_DS no_END 3.0103 7.952 no_Haplotype +Score no_InbreedingCoeff 1 0.5 30.99 -1.24E+00 10.1 +9 no_RAW_MQ 0.792 2.642 AAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTG +TGTCCATGTCAGAGCAA[C]GGCCCAAGTCTGGGTCTGGGGGGGAAGGTGTCATGGAGCCCCCTACGAT +T pass XX . 0 1 0 chr1 16949 . A C 183.77 . AC=1;AF=0.500;AN=2;Base +QRankSum=-7.580e-01;ClippingRankSum=0.00;DP=31;ExcessHet=3.0103;FS=0. +000;MLEAC=1;MLEAF=0.500;MQ=21.07;MQRankSum=-1.095e+00;QD=5.93;ReadPos +RankSum=-3.240e-01;SOR=0.515 GT:AD:DP:GQ:PL 0/1:19,12:31:99:212 +,0,396 chr1 16949 16949 A C ncRNA_exonic WASH7P + downstream MIR6859-3 rs199745162 +0.0139776 0.0227 0.0144 0.0143 0.0159 + + + + 0.0159444 415 26028 chr1:16949:1694 +9:A:C 1 0.5 -7.58E-01 0 31 no_DS no_END 3.010 +3 0 no_HaplotypeScore no_InbreedingCoeff 1 0.5 21.0 +7 -1.10E+00 5.93 no_RAW_MQ -3.24E-01 0.515 CTGGAATG +GTGCCAGGGGCAGAGGGGGCAATGCCGGGGCCCAGGTCGGCA[A]TGTACATGAGGTCGTTGGCAATGC +CGGGCAGGTCAGGCAGGTAGGATGGA pass XX . 0 1 0 chr1 17020 . G A 99.77 . AC=1;AF=0.500;AN=2;BaseQ +RankSum=-1.024e+00;ClippingRankSum=0.00;DP=26;ExcessHet=3.0103;FS=14. +474;MLEAC=1;MLEAF=0.500;MQ=21.00;MQRankSum=0.00;QD=3.84;ReadPosRankSu +m=-2.034e+00;SOR=2.800 GT:AD:DP:GQ:PL 0/1:18,8:26:99:128,0,424 + chr1 17020 17020 G A ncRNA_exonic WASH7P + downstream MIR6859-3 rs199740902 + + + + 0.0141002 367 + 26028 chr1:17020:17020:G:A 1 0.5 -1.02E+00 0 2 +6 no_DS no_END 3.0103 14.474 no_HaplotypeScore no_I +nbreedingCoeff 1 0.5 21 0 3.84 no_RAW_MQ -2.03E+ +00 2.8 ATGCCGGGCAGGTCAGGCAGGTAGGATGGAACATCAATCTCAGGCACCTG[G]CCC +AGGTCTGGCACATAGAAGTAGTTCTCTGGGACCTGCAAGATTAGGCA pass XX . + 0 1 0 chr1 17365 . C G 154.77 . AC=1;AF=0.500;AN=2;Base +QRankSum=0.621;ClippingRankSum=0.00;DP=43;ExcessHet=3.0103;FS=0.000;M +LEAC=1;MLEAF=0.500;MQ=43.91;MQRankSum=-1.245e+00;QD=3.60;ReadPosRankS +um=-1.701e+00;SOR=0.485 GT:AD:DP:GQ:PGT:PID:PL 0/1:36,7:43:99:0 +|1:17365_C_G:183,0,2051 chr1 17365 17365 C G ncRNA_ +exonic WASH7P downstream MIR6859-3 + rs369606208 0.2553 0.1603 + 0.221 0.3841 0.2245 0.2715 0.2581 0.2883 + + + 7.76E-05 12 + 154602 chr1:17365:17365:C:G 1 0.5 0.621 0 43 + no_DS no_END 3.0103 0 no_HaplotypeScore no_Inbreedin +gCoeff 1 0.5 43.91 -1.25E+00 3.6 no_RAW_MQ -1.70 +E+00 0.485 TGGGTCTTTGTTACAGCACCAGCCAGGGGGTCCAGGAAGACATACTTCTT[C +]TACCTACAGAGGCGACATGGGGGTCAGGCAAGCTGACACCCGCTGTCCTG pass XX +. 0 1 0

Replies are listed 'Best First'.
Re: error returning hash value?
by choroba (Cardinal) on Jan 08, 2017 at 23:56 UTC
    Your code and samples are too complicated. Try to simplify it: show only the relevant data (for example, 2 or 3 columns in the files should be enough). Also, remove any logic that's not relevant to your question.

    To be able to run your code, I first had to remove the filenames from the downloaded files, I then got a lot of warnings (why didn't you turn them on?) like:

    Odd number of elements in hash assignment at ./1.pl line 61. Use of uninitialized value in list assignment at ./1.pl line 61. Use of uninitialized value in join or string at ./1.pl line 83, <VAR> +line 7. Use of uninitialized value in join or string at ./1.pl line 83, <VAR> +line 7. Use of uninitialized value in join or string at ./1.pl line 83, <VAR> +line 7. Use of uninitialized value in join or string at ./1.pl line 83, <VAR> +line 7. Use of uninitialized value in join or string at ./1.pl line 83, <VAR> +line 7. Use of uninitialized value in join or string at ./1.pl line 83, <VAR> +line 7. Use of uninitialized value in join or string at ./1.pl line 83, <VAR> +line 7. Use of uninitialized value within %variants_geno in pattern match (m// +) at ./1.pl line 100, <VAR> line 7.

    The first two lines can be fixed by replacing the line with

    undef %variants_hash;

    as the original was equivalent to

    %variants_hash = ( undef => );

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

      Thank you for your time in replying. I'm sorry that I am not very good at asking questions; it isn't easy to know how much detail to include/not include. I always get it wrong :-(. I'll be honest in that I didn't understand what you meant by "To be able to run your code, I first had to remove the filenames from the downloaded files". I put the file names in because the code works on the principle that it reads in all .var files from the directory, so it made sense to me to put a header to say that they were named .var. Sorry if that was wrong.

      I have managed to get a bit further with it. I realised that i should have defined my hash as $zygos{$uniqlocus} = $zyg. I also noticed that $locus needed to be non-unique.

      I'll figure it out I'm sure. Thank you for your valiant efforts anyway!

        > I put the file names in

        Have you ever noticed the "download" link at the bottom of each code section? I clicked it to download the data including tabs, but they included the filenames. Don't make them part of the code section.

        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

        I'm struggling to understand why if you want to 'merge' records from 3 files you are creating hash keys containing the filename ?

        $fields[2] = $file; my $locus = join "\t", @fields[0,1,2,3,4,5];

        From this output code, it looks like you want to create a 'cross tab' with HET/HOM derived from each file in a different column. Is that correct ?

        my $samples_list = join "\t", @samples; print OUT "CHROM POS ID REF ALT QUAL $samples_list

        update : my best guess at what you are trying to do.
        it isn't easy to know how much detail to include/not include

        Think about: what is the minimum amount of code and sample data needed so that others can reproduce the problem? For each line of code, try removing it - does the problem still exist? If yes, it was ok to remove that line, if the problem goes away, put the line back in. And if removing the line causes a compilation failure, fix that first. See also Short, Self-Contained, Correct Example.

        Hi drlecb. There are a few ways to include sample files with your program on Perlmonks. To include it like you did and have the Perl program still be runnable you can add __DATA__ or __END__ after your Perl program and anything after it will be treated as data or comments. Another way is to start a new code section for your sample file with  <c> ... </c> so that the monks can click download for the sample file.

        Here is an example of using __DATA__

        use warnings; use strict; while(<DATA>){ my $line = $_; print $line; } __DATA__ Line 1 Line 2 Line 3