Re^2: Writing to file...
by Marshall (Canon) on Jul 22, 2009 at 09:05 UTC
|
Very nicely indented and done. Just some minor points:
1. glob is not that portable (BSD,DOS,Windows,SYS V), it is a mess. I prefer readdir and grep over glob.
2. I prefer lining up the { and } brackets over the old style 'C' way.
3. The best thing I liked about your code was that it was nicely spaced out. Whitespace is one of the most important things to put into code! Yes, blank lines are important!
4. I may have some mistakes below, but I think it continues along your fine way.
use strict;
use warnings;
my $dir = '.';
# for my $file (glob("$dir/*.seq"))
# "glob" is not portable even amoungst *nix systems
# and certainly not amoungst Windows systems.
opendir(DIR,$dir)
or die "unable to open directory $dir:$!";
foreach my $file ( grep{/\.seq$/ && -f $dir/$_}readdir DIR)
{
my $output_path = "$dir/$file.out";
my $input_path = "$dir/$file";
open (my $in, '<', $input_path) or
die "Unable to open $input_path: $!";
open (my $out, '>', $output_path) or
die "Unable to open $output_path: $!";
while (<$in>)
{
s/\^\^/\n^^/g;
print $out $_; # oopps ; was missing...
}
close ($in);
close ($out);
unlink ($input_path)
or die "unable to unlink $input_path: $!";
rename ($output_path, $input_path) or
die "Unable to rename $output_path to $input_path:$!";
}
Update: the unlink before rename is not necessary.
| [reply] [d/l] |
|
|
1. I've never had problems with glob except where spaces appear in the pattern, and then I use File::Glob 'glob';. I did not here because I thought it an unnecessary complication. KISS. If it is that easy to produce a portable version using opendir/readdir/closedir then one wonders why glob is not fixed in the base.
2. I used to use BSD style braces until I discovered TheDamian's Perl Best Practices.
3. Whitespace? I thought everyone coded like that.
4. It is a good idea to do a closedir (I'm sure you knew that).
Update: improved format.
| [reply] [d/l] |
|
|
This glob thing is just a mess. Part of the reason is the spaces that are allowed in some Windows file systems as you have noted! Youch! I beat the doo-doo out of an Active State app trying to get glob to work. Some of the modules worked with DOS glob and some with BSD glob. use File::Glob 'glob'; is not a magic cure all. This BSD vs Sys V thing is weirder to understand. Anyway, I've spent many hours in "glob hell" and I just don't do it anymore. Yes, closedir is the "right thing to do"! Every file handle comes from a limited pool of system resources. For a 20 line program, I don't worry about it much, but as they say "your mileage may vary". If this 20 line thing is in a client-server app, well then an open file handle here and there adds up! And the application bombs! I tutor some students at a local college and one fundamental mistake is lack of whitespace and separating "thought units", but you see that too! I have ammended my evil 'C' ways in favor of lining up braces. I am quite sure that this is the best way and I don't understand why you don't think that it is.
Anyway I say Horray! The OP has some good code to work from and I think he/she will do a great job!
| [reply] [d/l] |
|
|
glob is not that portable (BSD,DOS,Windows,SYS V), it is a mess. I prefer readdir and grep over glob.
In what respect? As long as you are using forward slashes for your pathes (in particular important if you are on Windows and dealing with UNC pathes), the fine thing about glob is that it works the same on all platform, since it doesn't rely on platform specific issues. At least this is true for any "reasonably recent" Perl; I darkly remember that for very early versions of Perl, glob used the shell mechanism on some platforms, which led to compatibility problems ... but this was long time ago.
--
Ronald Fischer <ynnor@mm.st>
| [reply] [d/l] |
|
|
Ronald, I would beg to differ. Glob can be a real mess. grep and readdir works on all platforms. The forward vs backslash is not an issue. Perl'ers take note: always use "/" instead of some convoluted "\\" idea for a Windows path. Perl will translate the "/" into the right thing for Windows.
Update:
There are plenty of "old" Perl 5.6 programs out there! I skipped 5.8 and went to 5.10, so I don't know what 5.8 does. The glob mess that exists in 5.6 is gonna be around for a long time.
| [reply] |
|
|
|
|
|
|
Line 9
foreach my $file ( grep{/\.seq$/ && -f $dir/$_}readdir DIR)
is giving me the error that the argument <filename> isn't numeric in division (/). Why is it saying this?
| [reply] [d/l] |
|
|
Try adding quotes as follows:
foreach my $file ( grep{/\.seq$/ && -f "$dir/$_"}readdir DIR)
| [reply] [d/l] |
|
|
|
|
Thanks for all the help so far guys!
One last question: I have this code trying to capture the locus name and place it after the ">" but nothing I do lets me capture it.
while (<$in>)
{
my $locus =~ /LOCUS\s+(\w+)/;
s/\^\^/>$locus\n^^\n/g;
print $out $_;
}
within the file the Locus name is formatted like this
Created: Tuesday, July 12, 2005 4:14 PM
LOCUS AJ877264 704 bp DNA linear INV 15
+-APR-2005
| [reply] [d/l] [select] |
|
|
I am unsure as to what you want. But using the single line that you have give as input, here are several ways to extract the 2nd thing past the "LOCUS" name:
#!/usr/bin/perl -w
use strict;
my $in = "LOCUS AJ877264 704 bp DNA linea
+r INV 15-APR-2005";
my $other_in = " LOCUS AJ877264 ";
#===================
# this is a bit "tricky"...$locus is in a list context
# if you don't do that then you get True/False value of
# the match
my ($locus) = ($in=~ m/LOCUS\s+(\w+)/);
print "LOCUS=$locus\n";
#PRINTS:
#LOCUS=AJ877264
#==============
# this does the same thing with spaces in front of LOCUS
# in other words, no change is needed...
my ($other_locus) = ($other_in=~ m/LOCUS\s+(\w+)/);
print "LOCUS=$other_locus\n";
#PRINTS:
#LOCUS=AJ877264
#=================
# if you want to guarantee that the LOCUS you match is
# the first on on line preceded by some possible blanks,
# my ($other_locus) = ($other_in=~ m/^\s*LOCUS\s+(\w+)/);
# will do it ^ start at beginning of line, then zero or
# blanks, then LOCUS, then one or more blanks then a
# sequence of "word" characters [0-9A-Za-z_]+
#===================
# This is a "list slice".
# split(/\s+/,$in) is put into a list context with ()
# and the second thing, index [1] is moved to lvalue
my $locus_also = (split(/\s+/,$in))[1];
#note "my ($locus_also) =" is also completely fine.
print "Another way: LOCUS=$locus_also\n";
#PRINTS:
#Another way: LOCUS=AJ877264
#===============
# the split version won't work with spaces in front
# of LOCUS as a "" list element will be created:
($locus_also) = (split(/\s+/,$other_in))[1];
print "Another way 2: LOCUS=$locus_also\n";
#PRINTS:
#Another way 2: LOCUS=LOCUS
($locus_also) = (split(/\s+/,$other_in))[2];
print "Another way 2: LOCUS=$locus_also\n";
#PRINTS:
#Another way 2: LOCUS=AJ877264
#=================
#or remove leading spaces, for the split case
$other_in =~ s/\s*//;
($locus_also) = (split(/\s+/,$other_in))[1];
print "Another way 2: LOCUS=$locus_also\n";
#PRINTS:
#Another way: LOCUS=AJ877264
| [reply] [d/l] |
|
|
Also, if ^^ is not present I need to add it in with a new line character before the sequence.
Something like...
while (<$in>)
{
if $in =~ /^^/ {
s/\^\^/\n^^/g;
print $out $_;
else {
s/[a]|[c]|[t]|[g]{6}/^^\n
}
How would I correctly write the else?
I was thinking just find a series of a, c, t, or g's and add the ^^ before it but I don't know how to get the same series of bases in the substitution. Sorry I'm really bad at this lol | [reply] [d/l] |
|
|
[a]|[c]|[t]|[g]{6} #looks like a char set
[actg]{6}
| [reply] [d/l] |