Re^4: Replacing substrings within hash values

Replies are listed 'Best First'.
Re^5: Replacing substrings within hash values by poj (Abbot) on Mar 24, 2016 at 14:20 UTC
Since you can access a sequence using it's key, no need to loop through them until you need to print. `#!/usr/bin/env perl use strict; use warnings; my %sequences = ( I => 'CATCAGTATAAAATGACTAGTAGCTAGATACCACAGATACGATACAACA', II => 'TACCACAGATACGATACAACACATCAGTATAAAATGACTAGTAGCAGAC', ); while (<DATA>) { my @f = split(/\s+/, $_); substr ($sequences{$f[0]},$f[1]-1,1) = $f[3]; } print "CGTTGGCATAAAATGACTAGTAGCTAGATACCACAGATACGATACAACA\n"; for my $key (keys %sequences){ print $sequences{$key}."\n"; } __DATA__ I 2 A G I 4 C T I 5 A G I 7 T C II 1 T C II 2 A G II 3 C T II 5 A C II 8 G T II 10 T G` [download] poj	[reply] [d/l]
Re^6: Replacing substrings within hash values by K_Edw (Beadle) on Mar 24, 2016 at 17:25 UTC
Thanks! That is a lot more succinct and efficient :) I had not considered using $F0 to specify the sequence directly.	[reply]
Re^6: Replacing substrings within hash values by K_Edw (Beadle) on Mar 25, 2016 at 11:16 UTC
Now that i'm no longer iterating over keys%sequences, is there a way to reset a variable for each key? I now wish to add a second type of edit where letters can be deleted or inserted (rather than replaced). This will shift all remaining positions out of sync and so the changes would need to be monitored so that they can compensated for. My original idea to do this was to keep a cumulative total of the insertion/deletion sizes and adjust each remaining position accordingly - which would reset to 0 for each key. Or would it be better to just create an array, with an entry for each key?	[reply]
Re^7: Replacing substrings within hash values by BrowserUk (Patriarch) on Mar 25, 2016 at 13:22 UTC
I now wish to add a second type of edit where letters can be deleted or inserted (rather than replaced). One method would be to load all the edits for a particular sequence into an array, and then perform them in reverse order by position. By doing those at the end first, any changes to length do not affect edits for earlier parts of the string. In the following I've used the redundant third field to hold the action 'I'nsert, 'D'elete, or 'R'eplace: #! perl -slw use strict; use Inline::Files; use Data::Dump qw[ pp ]; use constant { SEQ => 0, POS => 1, ACT => 2, REP => 3 }; my %seqs = map{ split "\n", $_ } <FASTA>; pp \%seqs; my @edits = [ split ' ', <EDITS> ]; while( 1 ) { my @bits = split ' ', <EDITS>; if( defined $bits[ 0 ] and $bits[ 0 ] eq $edits[ 0 ][ 0 ] ) { push @edits, \@bits; next; } for my $edit ( sort{ $b->[POS] <=> $a->[POS] } @edits ) { if( $edit->[ACT] eq 'I' ) { substr( $seqs{ '>' . $edit->[SEQ] }, $edit->[POS]-1, 0, $e +dit->[REP] ); } elsif( $edit->[ACT] eq 'D' ) { substr( $seqs{ '>' . $edit->[SEQ] }, $edit->[POS]-1, 1, '' + ); } else { ## replace substr( $seqs{ '>' . $edit->[SEQ] }, $edit->[POS]-1, 1, $e +dit->[REP] ); } } last unless defined $bits[ 0 ]; @edits = \@bits; } pp \%seqs; __FASTA__ >I CATCAGTATAAAATGACTAGTAGCTAGATACCACAGATACGATACAACA >II TACCACAGATACGATACAACACATCAGTATAAAATGACTAGTAGCAGAC __EDITS__ I 2 I I I 4 D X I 5 R G I 7 I C II 1 D X II 2 I I II 3 R T II 5 D X II 8 R T II 10 I I [download] I've also used I and X as the 'replacement char' for insert and delete respectively to make verification easier. Outputs: `C:\test>1158701 { ">I" => "CATCAGTATAAAATGACTAGTAGCTAGATACCACAGATACGATACAACA", ">II" => "TACCACAGATACGATACAACACATCAGTATAAAATGACTAGTAGCAGAC", } { ">I" => "CIATGGCTATAAAATGACTAGTAGCTAGATACCACAGATACGATACAACA", ">II" => "IATCCATAITACGATACAACACATCAGTATAAAATGACTAGTAGCAGAC", }` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :) In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^7: Replacing substrings within hash values by poj (Abbot) on Mar 25, 2016 at 11:30 UTC
a variable for each key That's just another hash. `my %offset = (); # inserts $offset{$key} += 1; # length of insert # deletes $offset{$key} -= 1;` [download] poj	[reply] [d/l]
Re^5: Replacing substrings within hash values by BillKSmith (Monsignor) on Mar 24, 2016 at 22:48 UTC
I did not realize that you were trying to exploit the ordering of the edits. One way that your original code is wrong is that ignores an edit which does not belong to the current sequence rather than applying it to the next sequence. This would not be simple to fix. I strongly recommend you change to BrowserUK's algorithm. It would be very easy to add a test to verify that the character at the position to be edited is what the edit expects. The additional effort would be paid back, the first time that it finds an example of inconsistent data. Bill	[reply]
Re^6: Replacing substrings within hash values by BrowserUk (Patriarch) on Mar 25, 2016 at 01:57 UTC
It would be very easy to add a test to verify that the character at the position to be edited is what the edit expects. The additional effort would be paid back, the first time that it finds an example of inconsistent data. Whilst it would be easy to add; it is almost certainly unnecessary. When you know how these edit lists are produced, you realise that the sequences being edited were (half of) the input to the processing that produced the edit list; thus with real data, if the sequence name/id -- which tend to look like `uc002yje.1 chr21:13973492-13976330` or `32_Illumina_Multiplexing_PCR_Primer_1.01` or `ceti albus: chrom 1` or `SVN001-12\|RMNH.ARA.14133\|ANA0001\|CP\|M` etc. -- is found in the hash, then the likeihood that the edit file will contain a different initial character at the specified position is very small indeed. Perhaps the worst that could happen is that the post-edited sequence file could be (re)paired with the same edit file and re-run. The result would be that the entire file would be "edited", and the resulting output file would be identical to the input file. Ie. No harm done. What my code did lack was a check for/handling of, the existence of the sequence (from the edit file) in the hash (from the sequence file), which would almost certainly indicate that the wrong edit file was being paired with the sequence file (or vice versa). But then, my purpose was (as always) to provide the OP with the minimum demonstration that would explain the problem he was asking about -- in this case his misconception regarding 0-based and 1-based indexing -- and not production level, ready-to-run code. That said; the addition of the consistency check would do no harm either :) With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :) In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^7: Replacing substrings within hash values by BillKSmith (Monsignor) on Mar 25, 2016 at 03:51 UTC
I had no intention of criticizing your excellent post nor of recommending that production code should contain unnecessary error checking. Adding a test for an 'error' than truly cannot happen or that would not affect the result even if it did is probably a bad idea. There is a difference between ignoring a condition and deciding not to test for it. Bill	[reply]