K_Edw has asked for the wisdom of the Perl Monks concerning the following question:
I have a set of strings interspersed with header-lines marked with '>' - specifically, a .FASTA file containing multiple DNA sequences which are read into a hash using bioPerl (Bio::SeqIO). A short example is:
>I CATCAGTATAAAATGACTAGTAGCTAGATACCACAGATACGATACAACA >II TACCACAGATACGATACAACACATCAGTATAAAATGACTAGTAGCAGAC
Additionally, I have a second input file which specifies changes to any given DNA sequence, in this format: (shortened example and additional columns present but now shown)
I 2 A G I 4 C T I 5 A G I 7 T C II 1 T C II 2 A G II 3 C T II 5 A C II 8 G T II 10 T G
$F0 = DNA sequence the changes should be made to $F1 = Letter position that is involved in the change $F2 = Pre-existing letter $F3 = Replacement letter
I'm trying to write a simple script that will iterate through each DNA sequence, load up all the changes pertaining to it and make the substring replacements.
For example DNA sequence I will go from:
CATCAGTATAAAATGACTAGTAGCTAGATACCACAGATACGATACAACATo:
CGTTGGCATAAAATGACTAGTAGCTAGATACCACAGATACGATACAACAI'm very new to Perl and have run into some problems. This is my script so far:
The output is basically a copy of the input - the changes have not been made to each sequence but I am not sure why. I suspect the while loop is not structured correctly or my use of 'last' is incorrect (i'm trying to break out of the while loop to move onto the next DNA sequence). I'm sure there is a better approach. Any help would be much appreciated, especially updates/critique of the above code and pointing out where it is wrong as I plan to eventually include more complex changes (such as removal of letters instead of replacement) and feel this is a good way to access each hash value in-turn (but I may be wrong).#!/usr/bin/env perl use strict; use warnings; use Bio::SeqIO; #Read and store .FASTA file my %sequences; my $seqio = Bio::SeqIO->new(-file => $ARGV[0]); while(my $seqobj = $seqio->next_seq) { my $id = $seqobj->display_id; my $seq = $seqobj->seq; $sequences{$id} = $seq; } #Variant Input File & Output my $variants = $ARGV[1]; open IN, '<', $variants or die "$!"; open OUT, '>', 'output.txt' or die "$!"; #Loops my $iteration = 0; foreach my $key (sort keys%sequences) { $iteration ++; my $value = $sequences{$key}; while (<IN>) { my (@F) = split("\t", $_); if ($F[0] eq $key) { substr($value,0,$F[1]) = $F[3]; } else { last } } print OUT ">$key\n$value" if $iteration=~1; print OUT "\n>$key\n$value" if $iteration>1; }
Updates made - working, but inefficient:
substr($value,$F[1]-1,1) = $F[3]; use 'next' instead of 'last' added seek(IN,0,0);
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Replacing substrings within hash values
by BrowserUk (Patriarch) on Mar 24, 2016 at 10:11 UTC | |
by K_Edw (Beadle) on Mar 24, 2016 at 11:20 UTC | |
by BillKSmith (Monsignor) on Mar 24, 2016 at 13:27 UTC | |
by K_Edw (Beadle) on Mar 24, 2016 at 13:57 UTC | |
by poj (Abbot) on Mar 24, 2016 at 14:20 UTC | |
| |
by BillKSmith (Monsignor) on Mar 24, 2016 at 22:48 UTC | |
| |
|
Re: Replacing substrings within hash values
by Anonymous Monk on Mar 24, 2016 at 13:39 UTC | |
by K_Edw (Beadle) on Mar 24, 2016 at 13:50 UTC |