in reply to Perl script help to convert .txt file to .csv

use strict; use warnings; print "\n Running script for Jiggs \n"; my $infile = "foot.txt"; # <foot.txt> is not legal q +uotation in perl5 open my $in, "<", $infile or die $!; # Don't quote a variable ag +ain open my $out, ">", "foot1.txt" or die $!; my $line = <$in>; # Is $line the header? If s +o, don't call it "line" $line =~ s/\s+\Z/\n/; # When you chomp, you don't + get a "new" newline $line =~ s/ +/,/g; print $out $line; # Don't quote variable. It +now still has a newline print $out scalar <$in>; # What is this second line? + Something special? while (<$in>) { s/\s+\Z/\n/; s/ +/,/g; s/,length=/,/g; print $out $_; } close $in; close $out; print "\n Done!\n";

Note that this might up being invalid CSV. Use Text::CSV or Text::CSV_XS to ensure valid CSV output.


Enjoy, Have FUN! H.Merijn

Replies are listed 'Best First'.
Re^2: Perl script help to convert .txt file to .csv
by Seabass (Novice) on Dec 20, 2011 at 17:11 UTC
    Cool, thanks for the reply!

    Using your example, I found out there is no need for the header part at all. I removed that part of the code, and then used the s//g; to get rid of the rest of the field names.

    Here is the code:
    use strict; use warnings; print "\n Running script for Jiggs \n"; my $infile = "foot.txt"; open my $in, "<", $infile or die $!; open my $out, ">", "foot1.txt" or die $!; while (<$in>) { s/\s+\Z/\n/; s/ +/,/g; s/,length=/,/g; s/,xy=/,/g; s/,region=/,/g; s/,run=/,/g; print $out $_; } close $in; close $out; print "\n Done!\n";
    Here is the output:
    >G9JVYGV01AJE8V,135,0104_0349,1,R_2011_09_20_15_00_06_ GGTGGTAGTGAAGAAGAGGAGATGAAAGTGGAAGAGGTTGAGGATGAGAAGGTTGAATTG GAAGAAGAAGATGAGAAGGTTGAAGTGGAAGATGAGAAGGTTGAAGTGGAAGAAGATGAA GTGGAAGAGAGGAGC >G9JVYGV01A4910,90,0353_0150,1,R_2011_09_20_15_00_06_ GGTGCATGGCATTGTAGATGGTTGCTTGATAGTTGCCCATACGTGTACTACACTTGCAGA GTGAAGCAACCATCTACAATGCCATGCACC >G9JVYGV01A0SVP,70,0302_0163,1,R_2011_09_20_15_00_06_ GCACCATTCAGCACAGATATAGTAGCCACATCAACACAAGTTACCTAACTATATCTGTGC TGAATGGTGC >G9JVYGV01A221U,89,0328_0160,1,R_2011_09_20_15_00_06_ CTGGACATTTACATCCATAAGTAGGAGTTAGGACTCTGCACCAGCCTCTTGAGCTTGTGA CGTCTCTTCTCCTCCTCCGGACTGGGACA >G9JVYGV01BVCPK,46,0650_0134,1,R_2011_09_20_15_00_06_ GCAAGATCGCAAGCCAAGCAACGTTTCACGAACTGGCCAGAATGAG >G9JVYGV01AOU3I,81,0166_0220,1,R_2011_09_20_15_00_06_ TCATTGACATCTGTGCAGCTGCAGGAGCGGATATGAGGAGATGGTTCTATCTGCACAGAT GTCAATGAGTGTGACAGTGAT >G9JVYGV01A0JEL,61,0299_0171,1,R_2011_09_20_15_00_06_ CGAGTGAAGGCATTGGTGATGCTGGTGTGAAGAGTGAGGGCATCGCCAATGCCTTCACTC G >G9JVYGV01AUKIG,119,0231_0198,1,R_2011_09_20_15_00_06_ GGCCACCAGGGCTTAACTTCCTGTGCCTCACCATCACGCAGTTGTCAGAGGATCCACATT GAACAAAGTAGCAATTCTTTCCACTCTGTGACACACCAACATTCTTATACAGCACCAGG >G9JVYGV01AJ8F7,29,0113_1333,1,R_2011_09_20_15_00_06_ CTGCTTCCAAGCCTCCAACCTCTAACCAG >G9JVYGV01AMQ87,79,0142_0233,1,R_2011_09_20_15_00_06_ AGAGTCTCCTCATTGTTCTTTCCAAGTCCTCTATTGCTGAGCCTGGTTTCGTACCTTCTC AGCTAGGCCCTCTTTCTCT >G9JVYGV01A4W45,85,0348_3895,1,R_2011_09_20_15_00_06_ GCTTCACATCTCAGAAATATAACCGCTAATGATCTGAAACAAGTTACAATCTGACATTCT GAAACCAAATGAAAGCAGCATAAAC >G9JVYGV01A7TPA,66,0382_0140,1,R_2011_09_20_15_00_06_ ATGGCTTACCTCACTGTCGATGGAGATCGAATGCAAGCGATGTCCATCGACAGTGAGGTA AGCCAT

    Almost there, but I need to put a comma after the last field before the sequence. Then remove the returns within the sequence.

    If each entry starts with a carot, then I need 6 fields seperated by commas and then the new line: the opening entry, the length=, xy=, region=, run=, AAGGTTGGCC /n).

    Apologies for not being clearer, but thanks for the help so far.