He77e has asked for the wisdom of the Perl Monks concerning the following question:
Greetings Esteemed Monks,
I am relatively new to Perl so this may be an easy fix.
My script here is meant to import a CSV, parse, output a new string containing the elements in FASTA format into an array and write to a file.
The problem I come across is that before each new entry a blank space (\s) is inserted; I assume the problem is with the way I am exporting the array to a file but I cannot find a method which deals with the problem.
Any help? (script shown below)
use Text::CSV; use Data::Dumper qw(Dumper); print "Enter file name: \n"; my $file = <STDIN>; chomp $file; print "Enter output file name: \n"; my $ofile = <STDIN>; my $csv = Text::CSV->new({ sep_char => ',' }); my @fasta; open(my $data, '<', $file) or die "Could not open '$file' $!\n"; while (my $line = <$data>) { chomp $line; if ($csv->parse($line)) { my @fields = $csv->fields(); #print Dumper \@fields; $fields[4]=~s/\s//gs; #removes spaces within the sequence push @fasta,"\>$fields[0]\_$fields[1]\_$fields[2]\_$fields[3]\n$ +fields[4]\n"; #outputs the correct format } else { warn "Line could not be parsed: $line\n"; } } #print Dumper \@fasta; open (FH,">$ofile"), print FH"@fasta", close; end;
Sample input: (it is in TSV format but is read into the script anyway without a problem)
XP_014917420.1 CYP26A1 Acinonyx jubatus Cheetah MGFPFFGE +TLQMVLQRRKFLQMKRRKYGFIYKTHLFGRPTVRVMGADNVRRILLGEHRLV SVHWPASVRPILGSGC +LSNLHDSSHKQRKKVIMRAFSREALQYYVPVIAEEVGTCLEQWL SCGERGLLVYPQVKRLMFRIAMRI +LLGCEPRLANGGDAEQQLVEAFEEMTRNLFSLPIDV PFSGLYRGMKARNLIHARIEENIRAKICGLRA +AEAEAEAGGGCKDALQLLVDHSWERGER LDMQALKQSSTELLFGGHETTASAATSLITYLGLYPHVLQ +KVREELKSKGLLCKSNQDNK LDMEILGQLKYIGCVIKETLRLNPPVPGGFRVALKTFELNGYQIPKGW +NVIYSICDTHDV ADIFTNKEEFNPDRFMLPHPEDASRFSFIPFGGGAKILLKIFTVELARHCDWRLLN +GPPT MKTSPTVYPVDDLPARFTRFQGET XP_002916147.1 CYP26A1 Ailuropoda melanoleuca Giant Panda +MGLPALLASALCTFVLPLLLFLAAIKLWDLYCVSGRDRSCALPLPPGTMGFPFFGETLQM VLQRRKFL +QMKRRKYGFIYKTHLFGRPTVRVMGADNVRRILLGEHRLVSVHWPASVRTIL GSGCLSNLHDSSHKQR +KKVIMRAFSREALQCYVPVIAEEVGTCLEQWLSCGERGLLVYPQ VKRLMFRIAMRILLGCDPRLASGG +DAEQQLVEAFEEMTRNLFSLPIDVPFSGLYRGMKAR NLIHARIEENIRAKICGLRTAEAASGCKDALQ +LLIEHSWERGERLDMQALKQSSTELLFG GHETTASAATSLITYLGLYPHVLQKVREELKSKGLLCKSN +QDNKLDMEILEQLKYIGCVI KETLRLNPPVPGGFRVALKTFELNGYQIPKGWHVIYSICDTHDVADSF +TNKDEFNPDRFL QPHPEDASRFSFIPFGGGLRSCVGKEFAKMLLKIFTVELARHCDWRLLNGPPTMKT +SPTV YPVDGLPARFTHFQGEI XP_006276679.1 CYP26A1 Alligator mississippiensis American Al +ligator MGFALLASALCTLLLPLLLFLAAVKLWGLYCESGRDPGCPLPLPPGTMGLPFFGETLQ +MV LQRRKFLQVKRRKYGCIYKTHLFGRPTVRVLGADNVRRILLGEHRLVAVQWPASVRTILG SGCLS +NLHDARHKQRKKVIMRAFSRDALRHYAPVMQEEVSGCLARWLGRGGACLLVYPEV KRLMFRIAMRLLL +GFEPHQADSGSERQLVEAFEEMSRNLFSLPIDVPFSGLYRGLRARNI IHARIEANIRNRMARAEPGGG +PKDALQLLLEQAQRHGQPLNMQELKESATELLFGGHETT ASAATSLITFLGLHPEVLQKVRKELQGNG +LLCSPNQDSKTLDMEVLEQLKYTGCVIKETL RLSPPVPGGFRVALKTFELNGYQIPKGWNVIYSICDT +HDVAELFTNKDKFNPDRFMSPSP EDSSRFSFIPFGGGVRSCVGKEFAKILLKIFTVELARNCDWQLLN +GPPTMKTGPIVYPVD NLPAKFVGFSGQI XP_021123924.1 CYP26A1 Anas platyrhynchos Mallard MGFSALL +ASALCTFLLPLLLFLAAVKLWDLYCVSSRDPSCPLPLPPGTMGLPFFGETLQM VLQRRKFLQMKRRKY +GFIYKTHLFGRPTVRVMGAENVRHILLGEHRLVSVQWPGSPPPPP LPRPPGQVIMRAFSRDALQHYVP +VIQEEVSACLARWLGAAGPCLLVYPEVKRLMFRIAMR ILLGFQPRQAGPDGEQQLVEAFEEMIRNLFS +LPIDVPFSGLYRGLRARNIIHAKIEENIR AKMARKEPEGGYKDALQLLMEHTQGNGEQLNMQELKESA +TELLFGGHETTASAATSLIAF LGLHHDVLQKVRKELQVKGLLCSPNQEKQLDMEVLEQLKYTGCVIKE +TLRLSPPVPGGFR IALKTLELNGYQIPKGWNVIYSICDTHDVADLFTNKDEFNPDRFMSPSPEDSSRF +SFIPF GGGLRSCVGKEFAKVLLKIFIVELARSCDWQLLNGPPTMKTGPIVYPVDNLPTKFIGFSG QI
Sample output: (note the \s added to every entry excluding the first)
Any help would be greatly appreciated!>XP_002916147.1_CYP26A1_Ailuropoda melanoleuca_Giant Panda MGLPALLASALCTFVLPLLLFLAAIKLWDLYCVSGRDRSCALPLPPGTMGFPFFGETLQMVLQRRKFLQM +KRRKYGFIYKTHLFGRPTVRVMGADNVRRILLGEHRLVSVHWPASVRTILGSGCLSNLHDSSHKQRKKV +IMRAFSREALQCYVPVIAEEVGTCLEQWLSCGERGLLVYPQVKRLMFRIAMRILLGCDPRLASGGDAEQ +QLVEAFEEMTRNLFSLPIDVPFSGLYRGMKARNLIHARIEENIRAKICGLRTAEAASGCKDALQLLIEH +SWERGERLDMQALKQSSTELLFGGHETTASAATSLITYLGLYPHVLQKVREELKSKGLLCKSNQDNKLD +MEILEQLKYIGCVIKETLRLNPPVPGGFRVALKTFELNGYQIPKGWHVIYSICDTHDVADSFTNKDEFN +PDRFLQPHPEDASRFSFIPFGGGLRSCVGKEFAKMLLKIFTVELARHCDWRLLNGPPTMKTSPTVYPVD +GLPARFTHFQGEI >XP_006276679.1_CYP26A1_Alligator mississippiensis_American Alligator MGFALLASALCTLLLPLLLFLAAVKLWGLYCESGRDPGCPLPLPPGTMGLPFFGETLQMVLQRRKFLQVK +RRKYGCIYKTHLFGRPTVRVLGADNVRRILLGEHRLVAVQWPASVRTILGSGCLSNLHDARHKQRKKVI +MRAFSRDALRHYAPVMQEEVSGCLARWLGRGGACLLVYPEVKRLMFRIAMRLLLGFEPHQADSGSERQL +VEAFEEMSRNLFSLPIDVPFSGLYRGLRARNIIHARIEANIRNRMARAEPGGGPKDALQLLLEQAQRHG +QPLNMQELKESATELLFGGHETTASAATSLITFLGLHPEVLQKVRKELQGNGLLCSPNQDSKTLDMEVL +EQLKYTGCVIKETLRLSPPVPGGFRVALKTFELNGYQIPKGWNVIYSICDTHDVAELFTNKDKFNPDRF +MSPSPEDSSRFSFIPFGGGVRSCVGKEFAKILLKIFTVELARNCDWQLLNGPPTMKTGPIVYPVDNLPA +KFVGFSGQI >ARO89874.1_CYP26A1_Andrias davidianus_Chinese Giant Salamander MSLYTLFASALCTLVLPLLLFLAAVKLWELYCISTRDRSCRCPLPPGTMGLPFFGETLQMVLQRRKFLQM +KRRKYGCIYKTHLFGRPTVRVMGAENVKQILLGEHRLVSVHWPASVRTILGSGCLSNLHDSQHKNRKKV +IMQAFSREALQHYIPVIEEEVRGALAQWLGGGGASVLVYPEVKRLMFRIAMRILLGFEPHQTDREMEQQ +LVEAFEEMIRNLFSLPIDVPFSGLYRGLKARNVIHAKIEENIRAKMAKESDTQYKDALQLLIEHTQKNG +EQLNMQELKESATELLFGGHETTASAATSLMTFLALHSDVLHKVRKELQIKDLLCDNKPLNIEALEQLK +YTGCVIKETLRLSPPVPGGFRVALKTFELNGYQIPKGWNVIYSICDTHDVAEIFPNKEEFNPDRFMSSH +PEDNSRFNFIPFGGGLRSCVGKEFAKILLKIFTVELARTCDWQLLNGAPTMKTGPIVYPVDNLPTKFIG +FNGII >XP_012310130.1_CYP26A1_Aotus nancymaae_Nancy Ma's Night Monkey MGLPALLASALCTFVLPLLLFLAAIKLWDLYCVSGRDRSCALPLPPGTMGFPFFGETLQMVLQRRKFLQM +KRRKYGFIYKTHLFGRPTVRVMGADNVRRILLGEHRLVSVHWPASVRTILGSGCLSNLHDSSHKQRKKV +IMRAFSREALKCYVPVIIEEVGSSLEQWLSCGERGLLVYPEVKRLMFRIAMRILLGCEPQLAGDRDAEQ +QLVEAFEEMTRNLFSLPIDVPFSGLYRGVKARNLIHARIEQNIRAKICGLRASEASRGCKDALQLLIEH +SWERGERLDMQALKQSSTELLFGGHETTASAATSLITYLGLYPHVLQKVREELKSKGLLCKSNQDNKLD +MEILEQLKYIGCVIKETLRLNPPVPGGFRVALKTFELNGYQIPKGWNVIYSICDTHDVAEIFTNKEEFN +PDRFMLPHPEDASRFSFIPFGGGLRSCVGKEFAKILLKIFTVELARHCDWQLLNGPPTMKTSPTVYPVD +NLPARFTHFHGEI
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Space inserted into output file
by poj (Abbot) on Jul 22, 2017 at 16:29 UTC | |
by He77e (Initiate) on Jul 23, 2017 at 17:44 UTC | |
|
Re: Space inserted into output file
by stevieb (Canon) on Jul 22, 2017 at 15:43 UTC | |
by He77e (Initiate) on Jul 22, 2017 at 17:31 UTC | |
by He77e (Initiate) on Jul 23, 2017 at 17:45 UTC | |
|
Re: Space inserted into output file
by Marshall (Canon) on Jul 23, 2017 at 21:41 UTC |