in reply to Re^6: Replacing substrings within hash values
in thread Replacing substrings within hash values
I now wish to add a second type of edit where letters can be deleted or inserted (rather than replaced).
One method would be to load all the edits for a particular sequence into an array, and then perform them in reverse order by position.
By doing those at the end first, any changes to length do not affect edits for earlier parts of the string.
In the following I've used the redundant third field to hold the action 'I'nsert, 'D'elete, or 'R'eplace:
#! perl -slw use strict; use Inline::Files; use Data::Dump qw[ pp ]; use constant { SEQ => 0, POS => 1, ACT => 2, REP => 3 }; my %seqs = map{ split "\n", $_ } <FASTA>; pp \%seqs; my @edits = [ split ' ', <EDITS> ]; while( 1 ) { my @bits = split ' ', <EDITS>; if( defined $bits[ 0 ] and $bits[ 0 ] eq $edits[ 0 ][ 0 ] ) { push @edits, \@bits; next; } for my $edit ( sort{ $b->[POS] <=> $a->[POS] } @edits ) { if( $edit->[ACT] eq 'I' ) { substr( $seqs{ '>' . $edit->[SEQ] }, $edit->[POS]-1, 0, $e +dit->[REP] ); } elsif( $edit->[ACT] eq 'D' ) { substr( $seqs{ '>' . $edit->[SEQ] }, $edit->[POS]-1, 1, '' + ); } else { ## replace substr( $seqs{ '>' . $edit->[SEQ] }, $edit->[POS]-1, 1, $e +dit->[REP] ); } } last unless defined $bits[ 0 ]; @edits = \@bits; } pp \%seqs; __FASTA__ >I CATCAGTATAAAATGACTAGTAGCTAGATACCACAGATACGATACAACA >II TACCACAGATACGATACAACACATCAGTATAAAATGACTAGTAGCAGAC __EDITS__ I 2 I I I 4 D X I 5 R G I 7 I C II 1 D X II 2 I I II 3 R T II 5 D X II 8 R T II 10 I I
I've also used I and X as the 'replacement char' for insert and delete respectively to make verification easier.
Outputs:
C:\test>1158701 { ">I" => "CATCAGTATAAAATGACTAGTAGCTAGATACCACAGATACGATACAACA", ">II" => "TACCACAGATACGATACAACACATCAGTATAAAATGACTAGTAGCAGAC", } { ">I" => "CIATGGCTATAAAATGACTAGTAGCTAGATACCACAGATACGATACAACA", ">II" => "IATCCATAITACGATACAACACATCAGTATAAAATGACTAGTAGCAGAC", }
|
|---|