MiamiGenome has asked for the wisdom of the Perl Monks concerning the following question:

Hello Computer Gurus!

I have a little scripting task which I kindly request your advice.

I have a series of files : a1.txt, a2.txt, ... a10000.txt

Each file is straight text, no carriage returns, no spaces, etc.

I would like to create a script to add

>filename [newline]
to the beginning of each file, and a newline to the end of each file.

For example,

filename: a36.txt
initial filecontents :

AATGACGTACGTAGTCGTAGCGT
after script filecontents :
>a36.txt AATGACGTACGTAGTCGTAGCGT [newline]
I do not yet have enough experience with scripts (awk, sed, perl) to make this easily, but I'm sure it should be.

Thank you very much !

update (broquaint): added formatting

Replies are listed 'Best First'.
Re: very basic, just beginning
by TomDLux (Vicar) on Aug 27, 2003 at 16:33 UTC

    Perfect for a shell script: I used bash.

    bash for i in * do (echo $i; cat $i; echo ) > tmp mv tmp $i done

    I'm preseuming newline was supposed to indicate an empty line.

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

Re: very basic, just beginning
by fletcher_the_dog (Friar) on Aug 27, 2003 at 17:31 UTC
    If each file really is "straight text, no carriage returns, no spaces, etc. " then all you have to do is the following from the command line:
    perl -pi -e '$_=">$ARGV\n$_\n"' *.txt
    This makes the assumption that all ".txt" files in your directory fit your criteria. "$ARGV" represents the current file being used if you are using the "<>" operator. When you use -p, perl assumes that there is a
    while(<>){}
    loop around your code and then prints $_ at the end of each iteration through the loop. The -i causes perl to be in "In-place edit" mode, which will cause each "print" statement without a filehandle to print to whatever file $ARGV currently is. CHANGE: As jmanning2k pointed out, I forgot to put ">" in front of the file name
      Update: I read the question better & turned down the flame-o-meter on this comment...

      This code might get you in trouble if there are newlines in the file. If that is the case then...

      It puts "filename.txt" as every other line in the file. The while loop is what gets you in trouble here. Perhaps if you do a undef $/ somewhere in there.

      Or, even better, start with your version and test for the first line.

      perl -pi -e '$ARGV =~ s/\.txt//; $_=">$ARGV\n$_\n if($.==1)"' *.txt
      This adds three things - it removes the .txt from the filename first. Second, it adds the initial '>' character. Finally, it only rewrites the first line. ($. is line number).

      Not the most elegant, but it works.

        Umm, I said "If each file really is "straight text, no carriage returns, no spaces, etc. " If there are no carriage returns there is only one line. If you are getting the file name every other line than your file has one more than one line and doesn't fit the requirement straight text, no carriage returns, no spaces, etc. Also, no where is it specified to remove the ".txt". In fact the example specifically shows the ".txt" being there
        filename: a36.txt initial filecontents : AATGACGTACGTAGTCGTAGCGT after script filecontents : >a36.txt AATGACGTACGTAGTCGTAGCGT [newline]
Re: very basic, just beginning
by CombatSquirrel (Hermit) on Aug 27, 2003 at 16:24 UTC
    Use File::Tie:
    #!perl use strict; use warnings; use Tie::File; my $dir = q[path/to/dir]; opendir(DIR, $dir) or die "Could not open '$dir': $!\n"; my @files = grep { -T and /a\d+\.txt/ } readdir(DIR); close DIR; for my $file (@files) { my @lines; tie @lines, 'Tie::File', "$dir/$file" or die "Failed tieing '$file' +: $!\n"; unshift @lines, $file; untie @lines; }

    Cheers,
    CombatSquirrel.
    Entropy is the tendency of everything going to hell.
Re: very basic, just beginning
by Anonymous Monk on Aug 27, 2003 at 16:37 UTC
      I'll second that. If you're doing any sort of biologic sequence manipulation, you must have bioperl.

      Here's some incentive to get it installed... Run this with an argument of '*.txt'.

      use Bio::Seq; use Bio::SeqIO; while(my $fname = shift) { open(INSEQ, "<$fname") or die $!; my $seq; ## Grab all the sequence in a single string while(<INSEQ>) { chomp; $seq .= $_; } close INSEQ; my $fname =~ s{\.txt}{}; # strip .txt ## Create a new Bio::Seq object with your sequence my $seqobj = Bio::Seq->new(-display_id => $fname, -seq => $seq ); ## Write it out to filename.fa my $outfile = $fname . ".fa"; my $seqout = Bio::SeqIO->new(-format => 'fasta', -file => "> $outfile", ); $seqout->write_seq($seqobj); }
      It'll do exactly what you asked. Plus, just change 'fasta' to some other format, and it will convert that too.
        Well, I didn't realize your input files were all newline free.. Should have read the question more closely.

        In that case, Bioperl is even better...

        use Bio::SeqIO; while (my $fname = shift) { ## raw is one seq per line (no newlines) my $in = Bio::SeqIO->new(-file => "$fname", '-format' => 'raw'); $fname =~ s{\.txt}{}' # strip .txt my $outfile = $fname . ".fa"; my $out = Bio::SeqIO->new(-file => "> $outfile" , '-format' => 'fasta'); ## Do the conversion while ( my $seq = $in->next_seq() ) { $seq->display_id($fname); # Add a name $out->write_seq($seq); } }
        Run it in the same way, with an argument of *.txt
Re: very basic, just beginning
by kesterkester (Hermit) on Aug 27, 2003 at 16:32 UTC
    This isn't especially elegant, but it'll do the job, as long as your files are all in one directory.
    #!/usr/bin/perl use warnings; use strict; # open current dir, and read in all files # matching /^a\d+\.txt/ into the array @files: # opendir my $dir_handle, "." or die "can't open dir"; my @files = grep { /^a\d+\.txt/ } readdir $dir_handle; closedir $dir_handle; # For each of the files: # foreach my $filename ( @files ) { # open the file, and read its contents into the # array @file_contents, then close the file: # open my $fh, "$filename" or die "can't open $filename to read"; my @file_contents = <$fh>; close $fh; # now reopen the file (for writing), and write your # desired output to it: # open $fh, ">$filename" or die "can't open $filename to write"; print $fh ">$filename\n@file_contents\n"; close $fh; }
Re: very basic, just beginning
by pzbagel (Chaplain) on Aug 27, 2003 at 17:21 UTC

    No one has recommended the append mode of open. Here is the perl one-liner:

    perl -e 'for (@ARGV){open FILE, ">>$_";print FILE "\n";close FILE;} ' +<filenames>

    Later

      Your code doesn't do what is asked. That one liner will append a newline to the end of the file.