He77e has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Esteemed Monks,

I am relatively new to Perl so this may be an easy fix.

My script here is meant to import a CSV, parse, output a new string containing the elements in FASTA format into an array and write to a file.

The problem I come across is that before each new entry a blank space (\s) is inserted; I assume the problem is with the way I am exporting the array to a file but I cannot find a method which deals with the problem.

Any help? (script shown below)

use Text::CSV; use Data::Dumper qw(Dumper); print "Enter file name: \n"; my $file = <STDIN>; chomp $file; print "Enter output file name: \n"; my $ofile = <STDIN>; my $csv = Text::CSV->new({ sep_char => ',' }); my @fasta; open(my $data, '<', $file) or die "Could not open '$file' $!\n"; while (my $line = <$data>) { chomp $line; if ($csv->parse($line)) { my @fields = $csv->fields(); #print Dumper \@fields; $fields[4]=~s/\s//gs; #removes spaces within the sequence push @fasta,"\>$fields[0]\_$fields[1]\_$fields[2]\_$fields[3]\n$ +fields[4]\n"; #outputs the correct format } else { warn "Line could not be parsed: $line\n"; } } #print Dumper \@fasta; open (FH,">$ofile"), print FH"@fasta", close; end;

Sample input: (it is in TSV format but is read into the script anyway without a problem)

XP_014917420.1 CYP26A1 Acinonyx jubatus Cheetah  MGFPFFGE +TLQMVLQRRKFLQMKRRKYGFIYKTHLFGRPTVRVMGADNVRRILLGEHRLV SVHWPASVRPILGSGC +LSNLHDSSHKQRKKVIMRAFSREALQYYVPVIAEEVGTCLEQWL SCGERGLLVYPQVKRLMFRIAMRI +LLGCEPRLANGGDAEQQLVEAFEEMTRNLFSLPIDV PFSGLYRGMKARNLIHARIEENIRAKICGLRA +AEAEAEAGGGCKDALQLLVDHSWERGER LDMQALKQSSTELLFGGHETTASAATSLITYLGLYPHVLQ +KVREELKSKGLLCKSNQDNK LDMEILGQLKYIGCVIKETLRLNPPVPGGFRVALKTFELNGYQIPKGW +NVIYSICDTHDV ADIFTNKEEFNPDRFMLPHPEDASRFSFIPFGGGAKILLKIFTVELARHCDWRLLN +GPPT MKTSPTVYPVDDLPARFTRFQGET XP_002916147.1 CYP26A1 Ailuropoda melanoleuca Giant Panda +MGLPALLASALCTFVLPLLLFLAAIKLWDLYCVSGRDRSCALPLPPGTMGFPFFGETLQM VLQRRKFL +QMKRRKYGFIYKTHLFGRPTVRVMGADNVRRILLGEHRLVSVHWPASVRTIL GSGCLSNLHDSSHKQR +KKVIMRAFSREALQCYVPVIAEEVGTCLEQWLSCGERGLLVYPQ VKRLMFRIAMRILLGCDPRLASGG +DAEQQLVEAFEEMTRNLFSLPIDVPFSGLYRGMKAR NLIHARIEENIRAKICGLRTAEAASGCKDALQ +LLIEHSWERGERLDMQALKQSSTELLFG GHETTASAATSLITYLGLYPHVLQKVREELKSKGLLCKSN +QDNKLDMEILEQLKYIGCVI KETLRLNPPVPGGFRVALKTFELNGYQIPKGWHVIYSICDTHDVADSF +TNKDEFNPDRFL QPHPEDASRFSFIPFGGGLRSCVGKEFAKMLLKIFTVELARHCDWRLLNGPPTMKT +SPTV YPVDGLPARFTHFQGEI XP_006276679.1 CYP26A1 Alligator mississippiensis American Al +ligator MGFALLASALCTLLLPLLLFLAAVKLWGLYCESGRDPGCPLPLPPGTMGLPFFGETLQ +MV LQRRKFLQVKRRKYGCIYKTHLFGRPTVRVLGADNVRRILLGEHRLVAVQWPASVRTILG SGCLS +NLHDARHKQRKKVIMRAFSRDALRHYAPVMQEEVSGCLARWLGRGGACLLVYPEV KRLMFRIAMRLLL +GFEPHQADSGSERQLVEAFEEMSRNLFSLPIDVPFSGLYRGLRARNI IHARIEANIRNRMARAEPGGG +PKDALQLLLEQAQRHGQPLNMQELKESATELLFGGHETT ASAATSLITFLGLHPEVLQKVRKELQGNG +LLCSPNQDSKTLDMEVLEQLKYTGCVIKETL RLSPPVPGGFRVALKTFELNGYQIPKGWNVIYSICDT +HDVAELFTNKDKFNPDRFMSPSP EDSSRFSFIPFGGGVRSCVGKEFAKILLKIFTVELARNCDWQLLN +GPPTMKTGPIVYPVD NLPAKFVGFSGQI XP_021123924.1 CYP26A1 Anas platyrhynchos Mallard  MGFSALL +ASALCTFLLPLLLFLAAVKLWDLYCVSSRDPSCPLPLPPGTMGLPFFGETLQM VLQRRKFLQMKRRKY +GFIYKTHLFGRPTVRVMGAENVRHILLGEHRLVSVQWPGSPPPPP LPRPPGQVIMRAFSRDALQHYVP +VIQEEVSACLARWLGAAGPCLLVYPEVKRLMFRIAMR ILLGFQPRQAGPDGEQQLVEAFEEMIRNLFS +LPIDVPFSGLYRGLRARNIIHAKIEENIR AKMARKEPEGGYKDALQLLMEHTQGNGEQLNMQELKESA +TELLFGGHETTASAATSLIAF LGLHHDVLQKVRKELQVKGLLCSPNQEKQLDMEVLEQLKYTGCVIKE +TLRLSPPVPGGFR IALKTLELNGYQIPKGWNVIYSICDTHDVADLFTNKDEFNPDRFMSPSPEDSSRF +SFIPF GGGLRSCVGKEFAKVLLKIFIVELARSCDWQLLNGPPTMKTGPIVYPVDNLPTKFIGFSG QI

Sample output: (note the \s added to every entry excluding the first)

>XP_002916147.1_CYP26A1_Ailuropoda melanoleuca_Giant Panda MGLPALLASALCTFVLPLLLFLAAIKLWDLYCVSGRDRSCALPLPPGTMGFPFFGETLQMVLQRRKFLQM +KRRKYGFIYKTHLFGRPTVRVMGADNVRRILLGEHRLVSVHWPASVRTILGSGCLSNLHDSSHKQRKKV +IMRAFSREALQCYVPVIAEEVGTCLEQWLSCGERGLLVYPQVKRLMFRIAMRILLGCDPRLASGGDAEQ +QLVEAFEEMTRNLFSLPIDVPFSGLYRGMKARNLIHARIEENIRAKICGLRTAEAASGCKDALQLLIEH +SWERGERLDMQALKQSSTELLFGGHETTASAATSLITYLGLYPHVLQKVREELKSKGLLCKSNQDNKLD +MEILEQLKYIGCVIKETLRLNPPVPGGFRVALKTFELNGYQIPKGWHVIYSICDTHDVADSFTNKDEFN +PDRFLQPHPEDASRFSFIPFGGGLRSCVGKEFAKMLLKIFTVELARHCDWRLLNGPPTMKTSPTVYPVD +GLPARFTHFQGEI >XP_006276679.1_CYP26A1_Alligator mississippiensis_American Alligator MGFALLASALCTLLLPLLLFLAAVKLWGLYCESGRDPGCPLPLPPGTMGLPFFGETLQMVLQRRKFLQVK +RRKYGCIYKTHLFGRPTVRVLGADNVRRILLGEHRLVAVQWPASVRTILGSGCLSNLHDARHKQRKKVI +MRAFSRDALRHYAPVMQEEVSGCLARWLGRGGACLLVYPEVKRLMFRIAMRLLLGFEPHQADSGSERQL +VEAFEEMSRNLFSLPIDVPFSGLYRGLRARNIIHARIEANIRNRMARAEPGGGPKDALQLLLEQAQRHG +QPLNMQELKESATELLFGGHETTASAATSLITFLGLHPEVLQKVRKELQGNGLLCSPNQDSKTLDMEVL +EQLKYTGCVIKETLRLSPPVPGGFRVALKTFELNGYQIPKGWNVIYSICDTHDVAELFTNKDKFNPDRF +MSPSPEDSSRFSFIPFGGGVRSCVGKEFAKILLKIFTVELARNCDWQLLNGPPTMKTGPIVYPVDNLPA +KFVGFSGQI >ARO89874.1_CYP26A1_Andrias davidianus_Chinese Giant Salamander MSLYTLFASALCTLVLPLLLFLAAVKLWELYCISTRDRSCRCPLPPGTMGLPFFGETLQMVLQRRKFLQM +KRRKYGCIYKTHLFGRPTVRVMGAENVKQILLGEHRLVSVHWPASVRTILGSGCLSNLHDSQHKNRKKV +IMQAFSREALQHYIPVIEEEVRGALAQWLGGGGASVLVYPEVKRLMFRIAMRILLGFEPHQTDREMEQQ +LVEAFEEMIRNLFSLPIDVPFSGLYRGLKARNVIHAKIEENIRAKMAKESDTQYKDALQLLIEHTQKNG +EQLNMQELKESATELLFGGHETTASAATSLMTFLALHSDVLHKVRKELQIKDLLCDNKPLNIEALEQLK +YTGCVIKETLRLSPPVPGGFRVALKTFELNGYQIPKGWNVIYSICDTHDVAEIFPNKEEFNPDRFMSSH +PEDNSRFNFIPFGGGLRSCVGKEFAKILLKIFTVELARTCDWQLLNGAPTMKTGPIVYPVDNLPTKFIG +FNGII >XP_012310130.1_CYP26A1_Aotus nancymaae_Nancy Ma's Night Monkey MGLPALLASALCTFVLPLLLFLAAIKLWDLYCVSGRDRSCALPLPPGTMGFPFFGETLQMVLQRRKFLQM +KRRKYGFIYKTHLFGRPTVRVMGADNVRRILLGEHRLVSVHWPASVRTILGSGCLSNLHDSSHKQRKKV +IMRAFSREALKCYVPVIIEEVGSSLEQWLSCGERGLLVYPEVKRLMFRIAMRILLGCEPQLAGDRDAEQ +QLVEAFEEMTRNLFSLPIDVPFSGLYRGVKARNLIHARIEQNIRAKICGLRASEASRGCKDALQLLIEH +SWERGERLDMQALKQSSTELLFGGHETTASAATSLITYLGLYPHVLQKVREELKSKGLLCKSNQDNKLD +MEILEQLKYIGCVIKETLRLNPPVPGGFRVALKTFELNGYQIPKGWNVIYSICDTHDVAEIFTNKEEFN +PDRFMLPHPEDASRFSFIPFGGGLRSCVGKEFAKILLKIFTVELARHCDWQLLNGPPTMKTSPTVYPVD +NLPARFTHFHGEI
Any help would be greatly appreciated!

Replies are listed 'Best First'.
Re: Space inserted into output file
by poj (Abbot) on Jul 22, 2017 at 16:29 UTC

    see perlvar

    $LIST_SEPARATOR
    $"
    When an array or an array slice is interpolated into a double-quoted string or a similar context such as /.../ , its elements are separated by this value. Default is a space. For example, this:
        print "The array is: @array\n";
    is equivalent to this:
        print "The array is: " . join($", @array) . "\n";
    

    Either use
    { local $" = ''; print FH "@fasta"; }

    or

    print FH join '',@fasta;

    or just print out each line as it is read

    #!/usr/local/bin/perl use strict; use Text::CSV; use Data::Dumper qw(Dumper); print "Enter input file name : "; my $file = <STDIN>; chomp $file; open my $in, '<', $file or die "Could not open '$file' $!\n"; print "Enter output file name: "; my $ofile = <STDIN>; chomp $ofile; open my $out, '>', $ofile or die "Could not open '$ofile' $!\n"; my $csv = Text::CSV->new({ sep_char => ',' }); my $count = 0; while (my $line = <$in>) { if ($csv->parse($line)) { my @fields = $csv->fields(); $fields[4]=~s/\s//g; #removes spaces within the sequence print $out '>'.(join '_',@fields[0..3])."\n$fields[4]\n"; } else { warn "Line could not be parsed: $line\n"; } ++$count; } close $in; close $out; print "$count lines read from $file to $ofile\n";
    poj

      Cheers mate, works perfectly!

      Rob

      2017-07-25 Athanasius restored original content

Re: Space inserted into output file
by stevieb (Canon) on Jul 22, 2017 at 15:43 UTC

    Welcome to the Monastery, He77e!

    I applaud you for formatting your post properly and asking a good question. However, it is difficult for us to test this without any sample input data, a sample of output you're currently getting, and expected output.

    Please provide a small block of each of these within your question (using the edit link). Each section should go in a separate <code></code> block like you've done with your code.

    Cheers,

    -stevieb

      Thanks for the feedback, much appreciated!

      I will have a go with the altered code tomorrow and get back to you.

      He77e

      Cheers mate, works perfectly!

      He77e

Re: Space inserted into output file
by Marshall (Canon) on Jul 23, 2017 at 21:41 UTC
    There is indeed an "ouput record separator" of a space for double quoted arrays. This is usually what you want.

    However, when you don't want the default space separator, just don't put the array in quotes! There is no need for the extra overhead of joining with a null string. Also no need to redefine the output record separator.

    #!/usr/bin/perl use strict; use warnings; my @array = (qw/A B C D E F/); print "array has ".@array," elements\n"; ## array has 6 elements # Note the above '.' puts @array into a scalar context print "@array\n"; ## 'A B C D E F' print "",@array,"\n"; ## 'ABCDEF' #note: added null string"" added so that print is not # confused about missing optional FileHandle. Note comma ',' # used instead of '.' as joining character.