Greetings Esteemed Monks,
I am relatively new to Perl so this may be an easy fix.
My script here is meant to import a CSV, parse, output a new string containing the elements in FASTA format into an array and write to a file.
The problem I come across is that before each new entry a blank space (\s) is inserted; I assume the problem is with the way I am exporting the array to a file but I cannot find a method which deals with the problem.
Any help? (script shown below)
use Text::CSV; use Data::Dumper qw(Dumper); print "Enter file name: \n"; my $file = <STDIN>; chomp $file; print "Enter output file name: \n"; my $ofile = <STDIN>; my $csv = Text::CSV->new({ sep_char => ',' }); my @fasta; open(my $data, '<', $file) or die "Could not open '$file' $!\n"; while (my $line = <$data>) { chomp $line; if ($csv->parse($line)) { my @fields = $csv->fields(); #print Dumper \@fields; $fields[4]=~s/\s//gs; #removes spaces within the sequence push @fasta,"\>$fields[0]\_$fields[1]\_$fields[2]\_$fields[3]\n$ +fields[4]\n"; #outputs the correct format } else { warn "Line could not be parsed: $line\n"; } } #print Dumper \@fasta; open (FH,">$ofile"), print FH"@fasta", close; end;
Sample input: (it is in TSV format but is read into the script anyway without a problem)
XP_014917420.1 CYP26A1 Acinonyx jubatus Cheetah MGFPFFGE +TLQMVLQRRKFLQMKRRKYGFIYKTHLFGRPTVRVMGADNVRRILLGEHRLV SVHWPASVRPILGSGC +LSNLHDSSHKQRKKVIMRAFSREALQYYVPVIAEEVGTCLEQWL SCGERGLLVYPQVKRLMFRIAMRI +LLGCEPRLANGGDAEQQLVEAFEEMTRNLFSLPIDV PFSGLYRGMKARNLIHARIEENIRAKICGLRA +AEAEAEAGGGCKDALQLLVDHSWERGER LDMQALKQSSTELLFGGHETTASAATSLITYLGLYPHVLQ +KVREELKSKGLLCKSNQDNK LDMEILGQLKYIGCVIKETLRLNPPVPGGFRVALKTFELNGYQIPKGW +NVIYSICDTHDV ADIFTNKEEFNPDRFMLPHPEDASRFSFIPFGGGAKILLKIFTVELARHCDWRLLN +GPPT MKTSPTVYPVDDLPARFTRFQGET XP_002916147.1 CYP26A1 Ailuropoda melanoleuca Giant Panda +MGLPALLASALCTFVLPLLLFLAAIKLWDLYCVSGRDRSCALPLPPGTMGFPFFGETLQM VLQRRKFL +QMKRRKYGFIYKTHLFGRPTVRVMGADNVRRILLGEHRLVSVHWPASVRTIL GSGCLSNLHDSSHKQR +KKVIMRAFSREALQCYVPVIAEEVGTCLEQWLSCGERGLLVYPQ VKRLMFRIAMRILLGCDPRLASGG +DAEQQLVEAFEEMTRNLFSLPIDVPFSGLYRGMKAR NLIHARIEENIRAKICGLRTAEAASGCKDALQ +LLIEHSWERGERLDMQALKQSSTELLFG GHETTASAATSLITYLGLYPHVLQKVREELKSKGLLCKSN +QDNKLDMEILEQLKYIGCVI KETLRLNPPVPGGFRVALKTFELNGYQIPKGWHVIYSICDTHDVADSF +TNKDEFNPDRFL QPHPEDASRFSFIPFGGGLRSCVGKEFAKMLLKIFTVELARHCDWRLLNGPPTMKT +SPTV YPVDGLPARFTHFQGEI XP_006276679.1 CYP26A1 Alligator mississippiensis American Al +ligator MGFALLASALCTLLLPLLLFLAAVKLWGLYCESGRDPGCPLPLPPGTMGLPFFGETLQ +MV LQRRKFLQVKRRKYGCIYKTHLFGRPTVRVLGADNVRRILLGEHRLVAVQWPASVRTILG SGCLS +NLHDARHKQRKKVIMRAFSRDALRHYAPVMQEEVSGCLARWLGRGGACLLVYPEV KRLMFRIAMRLLL +GFEPHQADSGSERQLVEAFEEMSRNLFSLPIDVPFSGLYRGLRARNI IHARIEANIRNRMARAEPGGG +PKDALQLLLEQAQRHGQPLNMQELKESATELLFGGHETT ASAATSLITFLGLHPEVLQKVRKELQGNG +LLCSPNQDSKTLDMEVLEQLKYTGCVIKETL RLSPPVPGGFRVALKTFELNGYQIPKGWNVIYSICDT +HDVAELFTNKDKFNPDRFMSPSP EDSSRFSFIPFGGGVRSCVGKEFAKILLKIFTVELARNCDWQLLN +GPPTMKTGPIVYPVD NLPAKFVGFSGQI XP_021123924.1 CYP26A1 Anas platyrhynchos Mallard MGFSALL +ASALCTFLLPLLLFLAAVKLWDLYCVSSRDPSCPLPLPPGTMGLPFFGETLQM VLQRRKFLQMKRRKY +GFIYKTHLFGRPTVRVMGAENVRHILLGEHRLVSVQWPGSPPPPP LPRPPGQVIMRAFSRDALQHYVP +VIQEEVSACLARWLGAAGPCLLVYPEVKRLMFRIAMR ILLGFQPRQAGPDGEQQLVEAFEEMIRNLFS +LPIDVPFSGLYRGLRARNIIHAKIEENIR AKMARKEPEGGYKDALQLLMEHTQGNGEQLNMQELKESA +TELLFGGHETTASAATSLIAF LGLHHDVLQKVRKELQVKGLLCSPNQEKQLDMEVLEQLKYTGCVIKE +TLRLSPPVPGGFR IALKTLELNGYQIPKGWNVIYSICDTHDVADLFTNKDEFNPDRFMSPSPEDSSRF +SFIPF GGGLRSCVGKEFAKVLLKIFIVELARSCDWQLLNGPPTMKTGPIVYPVDNLPTKFIGFSG QI
Sample output: (note the \s added to every entry excluding the first)
Any help would be greatly appreciated!>XP_002916147.1_CYP26A1_Ailuropoda melanoleuca_Giant Panda MGLPALLASALCTFVLPLLLFLAAIKLWDLYCVSGRDRSCALPLPPGTMGFPFFGETLQMVLQRRKFLQM +KRRKYGFIYKTHLFGRPTVRVMGADNVRRILLGEHRLVSVHWPASVRTILGSGCLSNLHDSSHKQRKKV +IMRAFSREALQCYVPVIAEEVGTCLEQWLSCGERGLLVYPQVKRLMFRIAMRILLGCDPRLASGGDAEQ +QLVEAFEEMTRNLFSLPIDVPFSGLYRGMKARNLIHARIEENIRAKICGLRTAEAASGCKDALQLLIEH +SWERGERLDMQALKQSSTELLFGGHETTASAATSLITYLGLYPHVLQKVREELKSKGLLCKSNQDNKLD +MEILEQLKYIGCVIKETLRLNPPVPGGFRVALKTFELNGYQIPKGWHVIYSICDTHDVADSFTNKDEFN +PDRFLQPHPEDASRFSFIPFGGGLRSCVGKEFAKMLLKIFTVELARHCDWRLLNGPPTMKTSPTVYPVD +GLPARFTHFQGEI >XP_006276679.1_CYP26A1_Alligator mississippiensis_American Alligator MGFALLASALCTLLLPLLLFLAAVKLWGLYCESGRDPGCPLPLPPGTMGLPFFGETLQMVLQRRKFLQVK +RRKYGCIYKTHLFGRPTVRVLGADNVRRILLGEHRLVAVQWPASVRTILGSGCLSNLHDARHKQRKKVI +MRAFSRDALRHYAPVMQEEVSGCLARWLGRGGACLLVYPEVKRLMFRIAMRLLLGFEPHQADSGSERQL +VEAFEEMSRNLFSLPIDVPFSGLYRGLRARNIIHARIEANIRNRMARAEPGGGPKDALQLLLEQAQRHG +QPLNMQELKESATELLFGGHETTASAATSLITFLGLHPEVLQKVRKELQGNGLLCSPNQDSKTLDMEVL +EQLKYTGCVIKETLRLSPPVPGGFRVALKTFELNGYQIPKGWNVIYSICDTHDVAELFTNKDKFNPDRF +MSPSPEDSSRFSFIPFGGGVRSCVGKEFAKILLKIFTVELARNCDWQLLNGPPTMKTGPIVYPVDNLPA +KFVGFSGQI >ARO89874.1_CYP26A1_Andrias davidianus_Chinese Giant Salamander MSLYTLFASALCTLVLPLLLFLAAVKLWELYCISTRDRSCRCPLPPGTMGLPFFGETLQMVLQRRKFLQM +KRRKYGCIYKTHLFGRPTVRVMGAENVKQILLGEHRLVSVHWPASVRTILGSGCLSNLHDSQHKNRKKV +IMQAFSREALQHYIPVIEEEVRGALAQWLGGGGASVLVYPEVKRLMFRIAMRILLGFEPHQTDREMEQQ +LVEAFEEMIRNLFSLPIDVPFSGLYRGLKARNVIHAKIEENIRAKMAKESDTQYKDALQLLIEHTQKNG +EQLNMQELKESATELLFGGHETTASAATSLMTFLALHSDVLHKVRKELQIKDLLCDNKPLNIEALEQLK +YTGCVIKETLRLSPPVPGGFRVALKTFELNGYQIPKGWNVIYSICDTHDVAEIFPNKEEFNPDRFMSSH +PEDNSRFNFIPFGGGLRSCVGKEFAKILLKIFTVELARTCDWQLLNGAPTMKTGPIVYPVDNLPTKFIG +FNGII >XP_012310130.1_CYP26A1_Aotus nancymaae_Nancy Ma's Night Monkey MGLPALLASALCTFVLPLLLFLAAIKLWDLYCVSGRDRSCALPLPPGTMGFPFFGETLQMVLQRRKFLQM +KRRKYGFIYKTHLFGRPTVRVMGADNVRRILLGEHRLVSVHWPASVRTILGSGCLSNLHDSSHKQRKKV +IMRAFSREALKCYVPVIIEEVGSSLEQWLSCGERGLLVYPEVKRLMFRIAMRILLGCEPQLAGDRDAEQ +QLVEAFEEMTRNLFSLPIDVPFSGLYRGVKARNLIHARIEQNIRAKICGLRASEASRGCKDALQLLIEH +SWERGERLDMQALKQSSTELLFGGHETTASAATSLITYLGLYPHVLQKVREELKSKGLLCKSNQDNKLD +MEILEQLKYIGCVIKETLRLNPPVPGGFRVALKTFELNGYQIPKGWNVIYSICDTHDVAEIFTNKEEFN +PDRFMLPHPEDASRFSFIPFGGGLRSCVGKEFAKILLKIFTVELARHCDWQLLNGPPTMKTSPTVYPVD +NLPARFTHFHGEI
In reply to Space inserted into output file by He77e
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |