Hello, I have been looking at this, and I believe I have a solution.

The main problem I faced was that the auto-vivified hashes/arrays initialise undefined. So I could not determine to fill out an undefined array as all would be undefined. I gather this is why you are iterating through the snps arrays first then trying to map the indexes of those to the indexes of the snp_bins.

Not being able to determine by whether an array was defined or not, I slept on it and in the morning decided that using your method to initially map the indexed array but then retro filling the undefined arrays with '1 1,' using an auto decrement unless/until loop may work.

I had to hammer at the loops for a bit, and the flow control assumes that the SAMPLE line is always line '1' of the data file. But it does what I think you need it to do.

Updated with do until loops and code works with tests printing at the end. The tests use the second slightly updated dnata file to include discontigous oasis coordinates - as per akme ordnance survey chart special duties to the pirates under the governance of silverbeard the undocumented. addenda: this scroll is only to be seen by the scribe on penalty of plank.

dnata file

SAMPLE,16287215,16287226,16287365,16287649,16287784,16287851,16287912 HG00553,0 0,0 0,0 0,0 0,0 0,0 0,0 0 HG00554,0 0,0 0,0 0,0 0,0 0,0 0,0 0 HG00637,0 0,0 0,0 0,0 0,0 0,0 0,0 0 HG00638,0 0,0 0,0 0,0 0,0 0,1 1,0 0 HG00640,0 0,0 0,0 0,0 0,0 0,1 1,0 0
SAMPLE,16287215,16287226,16287365,16287649,16287784,16887851,17187912 HG00553,0 0,0 0,0 0,0 0,0 0,0 0,0 0 HG00554,0 0,0 0,0 0,0 0,0 0,0 0,0 0 HG00637,0 0,0 0,0 0,0 0,0 0,0 0,0 0 HG00638,0 0,0 0,0 0,0 0,0 0,1 1,0 0 HG00640,0 0,0 0,0 0,0 0,0 0,1 1,0 0

solution

#!/usr/bin/perl -T use strict; use warnings; my (@snp_bins, %data); open my $in_file, "<", "./dnata"; while (<$in_file>) { chomp; if ($. == 1 ) { # line number my ( $placeholder, @coords ) = split /,/; @snp_bins = map int( $_ / 100_000 ), @coords; next; } if ($. >= 2){ my ( $id, @snpspairs ) = split /,/; foreach my $oasis (@snp_bins){ my $os = $oasis; @{ $data{$id}[$os] } = @snpspairs; $os--; unless ( defined( @{ $data{$id}[$os] } ) ){ do { @{ $data{$id}[$os] } = map (q(1 1), 0..99); # do 0..9 for readability on output!! $os--; } until( defined( @{ $data{$id}[$os] } ) ) } } } } foreach my $k (sort keys %data){ print $k," Ref: ",@{ $data{$k} }->[0],$/; print $k," 0: ",@{ $data{$k}->[0] },$/; print $k," 161: ",join (',',@{ $data{$k}->[161]}),$/; print $k," 161:5 ",@{ $data{$k}->[161]}[5],$/; print $k," 162: ",join (',',@{ $data{$k}->[162]}),$/; print $k," 162:5 ",@{ $data{$k}->[162]}[5],$/; print $k," 165: ",join (',',@{ $data{$k}->[165]}),$/; print $k," 165:5 ",@{ $data{$k}->[165]}[5],$/; print $k," 168: ",join (',',@{ $data{$k}->[168]}),$/; print $k," 168:5 ",@{ $data{$shark}->[168]}[5],$/; print $k," 170: ",join (',',@{ $data{$k}->[170]}),$/; print $k," 170:7 ",@{ $data{$k}->[170]}[7],$/; print $k," 171: ",join (',',@{ $data{$k}->[171]}),$/; print $k," 171:7 ",@{ $data{$k}->[171]}[7],$/; }

There may be better ways to control the flow re the line numbers and of course the flow depends on if /^SAMPLE/ lines occur more than once throughout the file. Also I expect my loop exits could be improved upon. However this should get the ball rolling. For a start there may be a bit of tweakery required when retro filling from say coord 175 back down to 162 so that the loop does not continue through 162 to 0, but the until(defined) condition I'm hoping should catch that.

Coyote

Updates: Discovered do until loops !!! Now this is working like a charm. Note, the output is controlling the comma placement for csv compatability, hence the join function in the tests. fixed: The '1 1' x 100 method was replaced with a map as the x 100 method was returning one very long string stored in index 0, rather than 100 index array each containing '1 1'. I also attempted to remove the oasis scalar but this broke the code so the island stays. :p


In reply to Re: Trying to edit HoAoA entries by Don Coyote
in thread Trying to edit HoAoA entries by iangibson

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.