jemo has asked for the wisdom of the Perl Monks concerning the following question:

Hi there

I am trying to write a script to replace ID names from a FILE2.txt (that contains a series of DNA sequences formatted as below) with ID names from a FILE1.txt (that contains a single column filled with rows of names).

Here's what FILE2.txt looks like:

>BAR12149;size=134; ATTGGCCAAATTG..... >BAR1524;size=1535; TTAAGGCCTTAAT..... ...etc/

Here's what FILE2.txt looks like:

GOM_202 GOM_23 .....etc/

The ideal final output would be in this format:

>GOM_202 ATTGGCCAAATTG..... >GOM_23 TTAAGGCCTTAAT..... .....ect/

I've written a script but I am a beginner with perl. Would anyone be able to help me with this? I would really appreciate any constructive comments!

#!/usr/bin/perl use strict; use warnings; my @arr; while (<>) { chomp; push @arr, $_ if length; last if eof; } while (<>) { print /^>/ ? shift(@arr) . "\n" : $_; }

Thanks in advance!!

Replies are listed 'Best First'.
Re: Replace ID names from FILE1 to FILE2
by NetWallah (Canon) on Jun 17, 2014 at 17:26 UTC
    From you description, the only "sync" criteria betwen the files seems to be the sequence.

    Based on that, this should work (untested):

    #!/usr/bin/perl use strict; use warnings; open my $dnaseq, '<', "FILE2.TXT" or die "Can't open FILE2(Sequences): + $!\n"; open my $namesfile, '<', "FILE1.TXT" or die "Can't open NAMES: FILE1: +$!\n"; while (my $name = <$namesfile>) { chomp $name; <$dnaseq>; # Throw away "BARmmm;size=xxx" record chomp(my $seq = <$dnaseq>) ; # Should check if we actually have a $seq here.... print "$name\n$seq\n"; } close $dnaseq; close $namesfile; # Should check if there are left over records in $dnaseq....

            What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?
                  -Larry Wall, 1992

Re: Replace ID names from FILE1 to FILE2
by roboticus (Chancellor) on Jun 17, 2014 at 17:15 UTC

    jemo:

    It looks like you just need to open the files and read from them instead of the standard input:

    open my $FILE1, '<', "FILE1.TXT" or die "Can't open FILE1: $!\n"; while (<$FILE1>) { ... }

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: Replace ID names from FILE1 to FILE2
by Laurent_R (Canon) on Jun 17, 2014 at 17:49 UTC
    Cross-posted on the DevShed forum (http://forums.devshed.com/perl-programming-6/replace-push-perl-962230.html) where I provided an answer as to what to fix (although not a complete solution). It is considered polite to mention it when you cross-post in various forums, so as to avoid duplicate work on various parts of the Internet.

    If I were to give a solution, it would be very similar to NetWallah's proposal, which has the advantage of not storing a file (which may be very large) into an array in memory.