ashnator has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks
I have written a program to take a character in File 1 to be used to replace a character at that place in File2. Meanwhile
I am not getting the right output.
My Code is :-
#!/usr/bin/perl use strict; use warnings; $location=""; my $qfn1 = "File1.txt"; my $qfn2 = "File2.txt"; my %positions; { open(my $fh, '<', $qfn1) or die("Cannot open file \"$qfn1\": $!\n"); while (<$fh>) { my ($key, $pos) = split /\s+/; $positions{$key} = $pos; $location = $3; } } my %sequences; { open(my $fh, '<', $qfn2) or die("Cannot open file \"$qfn2\": $!\n"); my $key; while (<$fh>) { if ( s/^>// ) { $key = ( split /\|/ )[1]; } else { chomp; $sequences{$key} .= $_; } } if ( ! exists( $sequences{$key} )) { warn "KEY $key in File1 not found in File2\n"; next; } my $dna = $sequences{$key}; my $index = rindex($dna,$positions{$key}); my $position = $index; my $current_base = substr($dna, $position, 1); my $newbase = $location; substr($dna,$position,1,$newbase); my $Location = $position + 1; print "$sequences{$key}"; }
My Input Files look like this:-
File 1:- 155369268 5 C 169212695 7 T File 2:- >gi|155369268|ref|NM_001100917.1| After all AAACAATGTCGATTCTATGATGCGAACGCAGCATTTCAGGGACTGGATGAGGAGCTTACGGTTTTTTACT ACAGAATCATCAATATCTTGGAAGAAAAAGAATGTTAAGAAATAACAAAACAATAATTATTAAGTACTTT >gi|169212695|ref|XM_001716884.1| Its the code AGACAAGCTTGTCCTGATGTTCCTTGCCCTGGCAGATGTTCAGGACCTTCCTTTGATTCAACCCCTCCAC CTAAATGGCCCAAGCTTTCGGGGCTGTCATTGTCTGTTTGTCATTCAAGGGCCCAAGCTGAAGAGGGGGT
I am looking for an output in the same format as File2 but with a replaced character.
Like Suppose in case 155369268 my 5th position changes from A to C. My 169212695 7th position changes from G to T.
So the output would be like this:-
>gi|155369268|ref|NM_001100917.1| After all AAACCATGTCGATTCTATGATGCGAACGCAGCATTTCAGGGACTGGATGAGGAGCTTACGGTTTTTTACT ACAGAATCATCAATATCTTGGAAGAAAAAGAATGTTAAGAAATAACAAAACAATAATTATTAAGTACTTT >gi|169212695|ref|XM_001716884.1| Its the code AGACAATCTTGTCCTGATGTTCCTTGCCCTGGCAGATGTTCAGGACCTTCCTTTGATTCAACCCCTCCAC CTAAATGGCCCAAGCTTTCGGGGCTGTCATTGTCTGTTTGTCATTCAAGGGCCCAAGCTGAAGAGGGGGT
good bye

20081224 Janitored by Corion: Restored content

Replies are listed 'Best First'.
Re: Character replacement
by johngg (Canon) on Dec 22, 2008 at 12:43 UTC

    When reading "File1.txt" you are not retaining the location information for each record. You could use a hash of hashes structure to do that. In the code below you can see this and I have added Data::Dumper output so that you can visualise the structure.

    There seems to be some confusion in your code as to how to find the position for replacement. It would appear that you read the postition and new location from "File1.txt" but you then try to find things using rindex and then record the $current_base even though you don't seem to use it and are replacing it anyway. My code below just does a replacement based on the information in "File1.txt".

    You process the records one at a time so I'm not sure why you use the %sequences hash. Why not just use a simple scalar to hold the record? I am assuming that the format you have posted for "File2.txt" is accurate with the data being broken into lines of 70 characters. Finding the correct position for the replacement needs to allow for the line breaks.

    The code below inlines the input files using heredocs and the output file by opening a filehandle on a scalar reference. This is just to keep everything in the script but you can simply relace those constructs with actual files. I have changed the locations to asterisks just so they stand out in the output file so you can see the replacement happening. I have added a record where the replacement is not on the first line of data and another record where no change takes place.

    use strict; use warnings; use Data::Dumper; open my $in1FH, q{<}, \ <<END_File1 or die qq{open: < "File1.txt": $!\ +n}; 155369268 5 * 169212695 7 * 175387629 127 * END_File1 my %positions = (); while( <$in1FH> ) { my( $key, $posn, $locn ) = split; $positions{ $key } = { position => $posn, location => $locn, }; } close $in1FH or die qq{close: < "File1.txt": $!\n}; print Data::Dumper->Dumpxs( [ \ %positions ], [ qw{ *positions } ] ); open my $in2FH, q{<}, \ <<END_File2 or die qq{open: < "File2.txt": $!\ +n}; >gi|155369268|ref|NM_001100917.1| After all AAACAATGTCGATTCTATGATGCGAACGCAGCATTTCAGGGACTGGATGAGGAGCTTACGGTTTTTTACT ACAGAATCATCAATATCTTGGAAGAAAAAGAATGTTAAGAAATAACAAAACAATAATTATTAAGTACTTT >gi|169212695|ref|XM_001716884.1| Its the code AGACAAGCTTGTCCTGATGTTCCTTGCCCTGGCAGATGTTCAGGACCTTCCTTTGATTCAACCCCTCCAC CTAAATGGCCCAAGCTTTCGGGGCTGTCATTGTCTGTTTGTCATTCAAGGGCCCAAGCTGAAGAGGGGGT >gi|175387629|ref|PM_001716884.1| Its the code again AGACAAGCTTGTCCTGATGTTCCTTGCCCTGGCAGATGTTCAGGACCTTCCTTTGATTCAACCCCTCCAC CTAAATGGCCCAAGCTTTCGGGGCTGTCATTGTCTGTTTGTCATTCAAGGGCCCAAGCTGAAGAGGGGGT AAGCTTTCTAAATGGCCCCGGGGCTTTGTCATTCAAGGGTGTGCCCAAGCCATTGTCTTGAAGAGGGGGT CGGGGCTTTTCAAGGGTCATTGTGCCCAAG >gi|192669295|ref|XM_001716884.1| Its the code AGACAAGCTTGTCCTGATGTTCCTTGCCCTGGCAGATGTTCAGGACCTTCCTTTGATTCAACCCCTCCAC CTAAATGGCCCAAGCTTTCGGGGCTGTCATTGTCTGTTTGTCATTCAAGGGCCCAAGCTGAAGAGGGGGT END_File2 # 1 2 3 4 5 6 7 #234567890123456789012345678901234567890123456789012345678901234567890 my $rsFile2out = \ do{ my $dummy }; open my $outFH, q{>}, $rsFile2out or die qq{open: > "File2out.txt": $!\n}; my $record = <$in2FH>; while( <$in2FH> ) { if( m{^>gi} ) { processRecord( $record ); $record = $_; } else { $record .= $_; } } processRecord( $record ); close $in2FH or die qq{close: < "File2.txt": $!\n}; close $outFH or die qq{close: > "File2out.txt": $!\n}; print ${ $rsFile2out }; sub processRecord { my $record = shift; my $key = ( split m{\|}, $record )[ 1 ]; if( exists $positions{ $key } ) { my( $header, @data ) = split m{\n}, $record; my $dataStr = join q{}, @data; substr $dataStr, $positions{ $key }->{ position } - 1, 1, $positions{ $key }->{ location }; print $outFH join qq{\n}, $header, unpack( q{(a70)*}, $dataStr ), q{}; } else { print $outFH $record; } return; }

    The output.

    %positions = ( '175387629' => { 'location' => '*', 'position' => '127' }, '169212695' => { 'location' => '*', 'position' => '7' }, '155369268' => { 'location' => '*', 'position' => '5' } ); >gi|155369268|ref|NM_001100917.1| After all AAAC*ATGTCGATTCTATGATGCGAACGCAGCATTTCAGGGACTGGATGAGGAGCTTACGGTTTTTTACT ACAGAATCATCAATATCTTGGAAGAAAAAGAATGTTAAGAAATAACAAAACAATAATTATTAAGTACTTT >gi|169212695|ref|XM_001716884.1| Its the code AGACAA*CTTGTCCTGATGTTCCTTGCCCTGGCAGATGTTCAGGACCTTCCTTTGATTCAACCCCTCCAC CTAAATGGCCCAAGCTTTCGGGGCTGTCATTGTCTGTTTGTCATTCAAGGGCCCAAGCTGAAGAGGGGGT >gi|175387629|ref|PM_001716884.1| Its the code again AGACAAGCTTGTCCTGATGTTCCTTGCCCTGGCAGATGTTCAGGACCTTCCTTTGATTCAACCCCTCCAC CTAAATGGCCCAAGCTTTCGGGGCTGTCATTGTCTGTTTGTCATTCAAGGGCCCAA*CTGAAGAGGGGGT AAGCTTTCTAAATGGCCCCGGGGCTTTGTCATTCAAGGGTGTGCCCAAGCCATTGTCTTGAAGAGGGGGT CGGGGCTTTTCAAGGGTCATTGTGCCCAAG >gi|192669295|ref|XM_001716884.1| Its the code AGACAAGCTTGTCCTGATGTTCCTTGCCCTGGCAGATGTTCAGGACCTTCCTTTGATTCAACCCCTCCAC CTAAATGGCCCAAGCTTTCGGGGCTGTCATTGTCTGTTTGTCATTCAAGGGCCCAAGCTGAAGAGGGGGT

    I hope this is helpful.

    Cheers,

    JohnGG

    Update: Fixed issues with TABs in posted code mucking up indentation.

Re: Character replacement
by kutsu (Priest) on Dec 22, 2008 at 08:33 UTC

    Have you ever considered using a Database or/and a Template system for this? I'd look at the DBI module and Template::Toolkit for this. After that it's a simple regex based on the hash (which I'm sure at least 3 regex solutions will be posted by the time I'm finished writing this).

Re: Character replacement
by Anonymous Monk on Dec 22, 2008 at 08:03 UTC