FarTech has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys

i have the following xyz file which contains

GTM1_MOUSE GLUTATHIONE S-TRANSFERASE GT8.7 (EC 2.5.1.1 (217 aa) WFAGDKVTYVDFLAYDILDQYRMFEPKCLDAFPNLRDFLARFEGLKKISAYMKS-SRYIA TPIFSKMAHWSNK GTM1_RAT GLUTATHIONE S-TRANSFERASE YB1 (EC 2.5.1.18) ( (217 aa) WFAGDKVTYVDFLAYDILDQYHIFEPKCLDAFPNLKDFLARFEGLKKISAYMKSSRYLST PIFSKLAQWSNK GTMU_CRILO GLUTATHIONE S-TRANSFERASE Y1 (EC 2.5.1.18) (217 aa) FAGDKVTLCGFLAYDVLDQYQMFEPKCLDPFPNLKDFLARFEGLKKISAYMKTSRFLRRP IFSKMAQWSNK GTMU_RABIT GLUTATHIONE S-TRANSFERASE MU 1 (EC 2.5.1.18 (217 aa) PMTLGYWDVRGLALPIRMLLEY--TDTSYEEKKYTMGDAPNYDQSK WLSEKFTLGL----DFPN-LPYLID-GTHKLTQSNAILRYLARKHGLCGETEEERIRVDI LENQLMDNRFQLVNVCYSPDFEKLKPEYLKGLPEKLQLYSQFLGSLPWFAGDKITFADFL GTM1_HUMAN GLUTATHIONE S-TRANSFERASE MU 1 (EC 2.5.1.18 (217 aa) LPEKLKLYSEFLGKRPWFAGNKITFVDFLVYDVLDLHRIFEPKCLDAFPNLKDFISRFEG LEKISAYMKSSRFLPRPVFSKMAVWGNK GLNA_ANASP GLUTAMINE SYNTHETASE (EC 6.3.1.2) (GLUTAMAT (473 aa) SLELALEALENDHAFLTDTGVFTEDFIQNWIDYKLANEVKQMQLRPH-PYEFSIYYDV GTM4_HUMAN GLUTATHIONE S-TRANSFERASE MU 4 (EC 2.5.1.18 (218 aa) LPTMMQHFSQFLGKRPWFVGDKITFVDFLAYDVLDLHRIFEPNCLDAFPNLKDFISRFEG LEKISAYMKSSRFLPKPLYTRVAVWGNK

i have to extract one sequence at a time from xyz file which can be either 2 or 3 lines and store it in a abc file which can be used as a input to fasta34.exe application

how should proceed

should i store the xyz file in a array and then extract lines till it come across a blank line or is there any other method to do so

Replies are listed 'Best First'.
Re: Arrays and Files
by borisz (Canon) on Jan 31, 2005 at 23:53 UTC
    local $/ = "\n\n"; open my $fh, "<","xyz" or die $!; while (<$fh>){ # $_ contains one entry here }
    Boris
Re: Arrays and Files
by gube (Parson) on Feb 01, 2005 at 04:23 UTC

    Hi, Try this the first sequence u will get u can write it to abc file

    undef $/; open(IN, "d:\\xyz.txt") || die "Cannot open file\n"; $str = <IN>; if ($str =~ m#(.*?)\n\n#gsi) { $a=$1; } print $a;
    o/p: GTM1_MOUSE GLUTATHIONE S-TRANSFERASE GT8.7 (EC 2.5.1.1 (217 aa) WFAGDKVTYVDFLAYDILDQYRMFEPKCLDAFPNLRDFLARFEGLKKISAYMKS-SRYIA TPIFSKMAHWSNK


    Regards,
    Gubendran.L

Re: Arrays and Files
by sh1tn (Priest) on Jan 31, 2005 at 23:55 UTC
    Is it possible to be just hash structure?
    Thus You can r/w in pairs (keys-source/values-dest):

    use IO::File; my $files = { 'x', 'a', 'y', 'b', 'z', 'c' }; for( keys %{$files} ){ $read = new IO::File "< $_"; die "can't open_r $read: $!" unless defined($read); $write = new IO::File "> $files->{$_}"; die "can't open_w $write: $!" unless defined($write); print $write $_ while <$read>; $read->close; $write->close; }
Re: Arrays and Files
by Grundle (Scribe) on Feb 01, 2005 at 00:23 UTC
    What exactly counts as a sequence?? Is it the lines underneath

    GTM1_MOUSE etc.. etc..

    or is it the whole entry? I am going to assume it is the whole entry.

    $file = "xyz"; if(!open(READ_FILE, "$file")){ die "Cannot open file [$file]\n"; } my $sequence = ""; while(my $line = <READ_FILE>){ if($line =~ m/^G.*\(EC/){ if($sequence){ #write sequence here. print "seq: $sequence\n"; $sequence = ""; } $sequence .= $line; }else{ $sequence .= $line; } } #write last sequence here print "last: $sequence\n";
    That regex is somewhat hardcoded, if you are nervous about it, you can always change it to look for the empty line. I sort of assumed that every sequence would start with

    G<something>

    and would have

    (EC <more-stuff>
Re: Arrays and Files
by Mago (Parson) on Feb 01, 2005 at 13:16 UTC

    #!/usr/local/bin/perl -w use strict; my (@seq, $x); open (F, "< xyz") || die "Cannot open file\n"; while (<F>) { chomp($_); if (length($_) > 0) { $seq[$x] .= $_; } else { $x++; } }


    Mago
    mago@rio.pm.org