Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

mapping data between files

by Angharad (Pilgrim)
on May 05, 2009 at 21:14 UTC ( #762080=perlquestion: print w/replies, xml ) Need Help??

Angharad has asked for the wisdom of the Perl Monks concerning the following question:

I have two files. A 'sequence' file, which looks like this.
>ONE IRLA >TWO REFT >THREE HTED
and a set of correponding 'co-ordinate' files. The name of the co-ordinate file that corresponds to a particular sequence in the sequence file is highlighted by '>' a line above the actual sequence. Each of these files is made up of entries. Each entry is numbered. Some co-ordinate files start with entry number 1 but not all. The one below (the co-ordinate file named 'ONE'), for example, starts with an entry numbered 12.
12 14.620 35.834 -16.759 1.00 11.04 13 15.922 36.983 -19.044 1.00 11.22 14 14.326 37.148 -21.240 1.00 11.40 15 11.528 38.248 -23.343 1.00 12.44
This corresponds to the first sequence in the sequence file
IRLA
I need to assign each letter in that sequence with the corresponding entry in the corresponding co-ordinate file. You read the sequence file from left to right with each letter needing to be mapped to an numbered entry in the co-ordinate file. So .. for the first letter 'R' in the sequence, it has to be mapped to entry number 13 in the co-ordinate file - the second letter in the sequence 'R', it corresponds to the second entry in the co-ordinate file, which in this case is entry 13. The resulting text file should look like this
I 12 R 13 L 14 A 15
I'm not sure what the best way of going about writing a perl script for this is. Any advice/hints much appreciated. The real examples are obviously far larger than the test case presented here

Replies are listed 'Best First'.
Re: mapping data between files
by MidLifeXis (Monsignor) on May 05, 2009 at 21:33 UTC

    It looks like you have a few steps to do to solve this.

    • identify the line from the sequence file, and split it into component pieces
    • Read the number of lines from the co-ordinate file and assign to each of the component pieces

    The problem seems to be ill defined. For example, you refer to "entry number 1 but not all". Do you mean "entry number ONE but not all"? What happens in the case of "not all"? Is there a problem domain into which this problem fits? If so, there may be a module that reads these types of files already.

    Update: This could also be afternoon fog. Is the information on the lines like >ONE indicating the name of a file to read?

    --MidLifeXis

    The tomes, scrolls etc are dusty because they reside in a dusty old house, not because they're unused. --hangon in this post

      Thanks for your quick response! >ONE etc does indeed indicate the name of a file to read

      For example - the 'ONE' co-ordinate file corresponds to the sequence IRLA in the sequence file

      So ... in the above test case you have one sequence file but 3 corresponding co-ordinate files

Re: mapping data between files
by citromatik (Curate) on May 06, 2009 at 08:04 UTC

    If I understood correctly, you want something like:

    • Read and parse the sequences file
    • For each sequence, open a file that is named after the sequence header
    • From that file, get the first number of each line
    • Associate each letter in the sequence with those numbers
    use strict; use warnings; my ($seqfile) = @ARGV; #seqfile is the file with the sequences. open my $seqfh, "<", $seqfile or die $!; { local $/ = "\n>"; while (my $nextseq = <$seqfh>){ chop $nextseq unless eof $seqfh; substr ($nextseq,0,1,"") if $. == 1; my ($name,@seq) = split /\n/,$nextseq; my @aas = split //,join "",@seq; my @coords = getCoords ($name); print "$aas[$_] $coords[$_]\n" for(0..$#aas); } } sub getCoords { my ($fname) = @_; local $/="\n"; my @ns; open my $fh, "<", $fname or die $!; while (my $line = <$fh>){ chomp $line; my @ff = split /\s+/,$line; push @ns, (split /\s+/,$line)[0]; } close $fh; return @ns; }

    citromatik

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://762080]
Approved by graff
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2023-02-01 15:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer not to run the latest version of Perl because:







    Results (10 votes). Check out past polls.

    Notices?