in reply to Re: Replacing substrings within hash values
in thread Replacing substrings within hash values

Thanks! I'll have a go at understanding this. I've not come across 'map' before. It would also be good if somebody could help me understand what's wrong with my own code (for learning purposes), and also because I may wish to introduce more ways of editing such as deleting certain letters and thus needing to keep track of changes as it goes to ensure replacements still occur in the right place.
  • Comment on Re^2: Replacing substrings within hash values

Replies are listed 'Best First'.
Re^3: Replacing substrings within hash values
by BillKSmith (Monsignor) on Mar 24, 2016 at 13:27 UTC

    The main problem with your code is that you must read the entire IN file for each sequence. This would work if you rewind (seek IN, 0, 0) the IN file at the end of the sequence loop. This approach is very inefficient. Run-time would become unacceptable for larger files.

    Your use of last is fine for sequence I. For all other sequences, it would exit the loop before it gets to the processing. Use next instead.

    Bill
      I see, thank you. The script is now working as desired after fixing the usage of substr, using next and the addition of seek(IN,0,0). Is there a more efficient way to accomplish this or is this an unavoidable consequence of the way the script is structured/written? My thought was that i'd only need to iterate over IN once as it's sorted and that once $F[0] no longer equalled the current key, it would begin reading IN where it left off for the next sequence before it terminated.

        Since you can access a sequence using it's key, no need to loop through them until you need to print.

        #!/usr/bin/env perl use strict; use warnings; my %sequences = ( I => 'CATCAGTATAAAATGACTAGTAGCTAGATACCACAGATACGATACAACA', II => 'TACCACAGATACGATACAACACATCAGTATAAAATGACTAGTAGCAGAC', ); while (<DATA>) { my @f = split(/\s+/, $_); substr ($sequences{$f[0]},$f[1]-1,1) = $f[3]; } print "CGTTGGCATAAAATGACTAGTAGCTAGATACCACAGATACGATACAACA\n"; for my $key (keys %sequences){ print $sequences{$key}."\n"; } __DATA__ I 2 A G I 4 C T I 5 A G I 7 T C II 1 T C II 2 A G II 3 C T II 5 A C II 8 G T II 10 T G
        poj

        I did not realize that you were trying to exploit the ordering of the edits. One way that your original code is wrong is that ignores an edit which does not belong to the current sequence rather than applying it to the next sequence. This would not be simple to fix.

        I strongly recommend you change to BrowserUK's algorithm. It would be very easy to add a test to verify that the character at the position to be edited is what the edit expects. The additional effort would be paid back, the first time that it finds an example of inconsistent data.

        Bill