Andrew_Levenson has asked for the wisdom of the Perl Monks concerning the following question:
Okay, i'm easing myself back into programming in general; it has been a while. This little program is supposed to help me recreate a larger program I had written: I want it to read the program file (as a .txt doc) and re-print it in another file, sans the comments that are on each line. However, nothing is happening and I can't figure out why. Any ideas? use strict;
use warnings;
my @array; my $i; my @new; my $nfile;
my $file=<>;
open FILE, "<$file";
while(<FILE>){
push @array, $_;}
close FILE;
for(@array){
($i)= grep { m/^#/ } $_;
push @new, $i;
}
$nfile=<>;
open FILE, ">$nfile";
while(<FILE>){
for(@new){
print FILE " \n\n $_ \n\n ";
}
}
Thanks in advance! Andrew Levenson
Re: File read and strip
by ikegami (Patriarch) on Nov 14, 2006 at 19:58 UTC
|
perl -ne "print unless /^#/" infile > outfile
Fix: (program -- memory efficient)
use strict;
use warnings;
die("usage: $0 {infile} {outfile}\n";
if @ARGV != 2;
my ($file_in, $file_out) = @ARGV;
open(my $fh_in, '<', $file_in)
or die("Unable to open input file \"$file_in\": $!\n");
open(my $fh_out, '>', $file_out)
or die("Unable to create output file \"$file_out\": $!\n");
while (<$fh_in>) {
if (!/^#/) {
print $fh_out $_;
}
}
Fix: (program -- (more) compact)
use strict;
use warnings;
die("usage: $0 {filein} {fileout}\n";
if @ARGV != 2;
my ($file_in, $file_out) = @ARGV;
open(my $fh_in, '<', $file_in)
or die("Unable to open input file \"$file_in\": $!\n");
open(my $fh_out, '>', $file_out)
or die("Unable to create output file \"$file_out\": $!\n");
print $fh_out (grep !/^#/, <$fh_in>);
| [reply] [d/l] [select] |
|
| [reply] [d/l] |
Re: File read and strip
by Joost (Canon) on Nov 14, 2006 at 19:50 UTC
|
my $file=<>;
Will set $file to the first line (including the newline) read from either STDIN or the file specified on the command line. This is almost certainly not what you want.
You probably want
my $file = shift @ARGV; # get first argument
my $nfile = shift @ARGV; # get next argument
Which, outside of subroutines can be abbreviated to
my $file = shift;
my $nfile = shift;
or you can use
my ($file,$nfile) = @ARGV;
By the way, it's always a good idea to check open() for errors:
open FILE,">$file" or die "Can't open $file: $!"
update: there are other errors in your code, too. Note that grep() returns a LIST, for one.
| [reply] [d/l] [select] |
|
Is returning a list bad? What does that mean in relation to what i'm trying to do?
(I don't mean to be obnoxious, i'm trying to make it easier by not just saying "I don't get it.")
| [reply] |
|
grep is normally used to return a list, like this:
my @new_list = grep { /regular expression/ } @list;
now, @new_list contains every value in @list that matches /regular expression/.
Your code:
($i)= grep { m/^#/ } $_;
push @new, $i;
Will set $i to undefined if $_ doesn't start with a # character, and then push it onto @new.
In other words you'll have undefined entries in @new where you probably wanted to skip those lines.
You might want to do some reading in the excellent perl documentation
updated: fixed link
| [reply] [d/l] [select] |
|
I'm sorry, but... what? The files are user-input, so the $file=<> has worked fine for me in the past.
Sorry, but I have no clue what else you said besides that. :( What does shift @ARGV do? What is it calling from? what does the $! in the die function do?
Sorry if these are stupid questions, and thanks.
| [reply] [d/l] [select] |
|
my_program inputfile outputfile
anyway,
my $whatever = <>;
will read a single line from the file(s) as specified in @ARGV. in other words, $whatever will not be the filename but the next (or first) line of the file you passed as an argument. if you don't pass any arguments <> will read from STDIN, which means it will usually wait for you to enter a line. See perlop (quoted below:) Note that lines read usually end in a newline character ("\n") while filenames usually do not.
while (<>) {
... # code for each line
}
is equivalent to the following Perl-like pseudo code:
unshift(@ARGV, ’-’) unless @ARGV;
while ($ARGV = shift) {
open(ARGV, $ARGV);
while (<ARGV>) {
... # code for each line
}
}
$! will contain the last filesystem error, which will make it a lot easier to see what's going wrong if you pass the wrong filename to open(), or don't have permissions etc. See also perlvar.
| [reply] [d/l] [select] |
|
The files are user-input, so the $file=<> has worked fine for me in the past.
I think he was assuming that the file name was coming from the command line. @ARGV is the array of command line arguments, so "shift @ARGV" removes and returns the first argument. If you want the user to enter a file name, then it would be a good idea to prompt for it, e.g.:
print "Enter filename: ";
chomp(my $file = <STDIN>);
what does the $! in the die function do?
See perlvar. It is the error message for system/library calls. You want to check to see if open fails, and if it does, see why it failed.
| [reply] [d/l] |
Re: File read and strip
by rhesa (Vicar) on Nov 14, 2006 at 20:05 UTC
|
Your biggest problem is in the last while(<FILE>) loop: you're trying to read from a file that you just opened for writing.
The other problem is with the grep: you're only collecting lines that start with a #, although you say you want to do the opposite.
You seem to be doing a lot of unnecessary looping too. I suggest the following as a more streamlined alternative:
use strict;
use warnings;
print "Input file name? ";
my $name_in = <>;
chomp $name_in;
print "output file name? ";
my $name_out = <>;
chomp $name_out;
open my $in, '<', $name_in or die "error opening $name_in: $!";
open my $out , '>', $name_out or die "error opening $name_out: $!";
print $out grep { !/^#/ } <$in>;
| [reply] [d/l] [select] |
Re: File read and strip
by GrandFather (Saint) on Nov 14, 2006 at 20:04 UTC
|
First thing I notice is I didn't notice $i being declared. Stacking up lines like that hides stuff. In this case it's even worse because $i should be declared in a much small scope - the for loop.
Next thing is that generally three prarameter open is prefered and open should always be checked:
open FILE, '<', $file or die "Failed to open $file: $!";
A side issue is:
($i)= grep { m/^#/ } $_;
The best you can hope for from that is $i will be undef if $_ starts with a #. That undef gets pushed into @new however. Probably what you really wanted was something like:
! m/^#/ and push @new, $_ for @array;
However your real issue is the while loop. You've just created FILE. It's empty. How many times it while (<FILE>) going to iterate? That is, how many of the 0 lines available in the newly created file is it going to read?
The while loop is not needed at all. The for loop inside it could be replaced by:
print FILE " \n\n $_ \n\n " for @new;
or remain as it is.
DWIM is Perl's answer to Gödel
| [reply] [d/l] [select] |
Re: File read and strip
by markh (Scribe) on Nov 14, 2006 at 19:56 UTC
|
I think the problem is in your section of:
for(@array){
($i)= grep { m/^#/ } $_;
push @new, $i;
}
What about replacing that with:
foreach my $line (@array) {
next if ($line =~ /^#/);
push @new, $line;
}
Mine might be a bit less compact, but it is much more readable and easier to see what it happening there.
Also it appears to me that when you are writing back to your file, you are putting in 4 extra linebreaks. Is that intentional? | [reply] [d/l] [select] |
|
The line breaks are intentional to make the final file more readable when I format it. What does using the {parameter?} "my $line" before the (@array) do? And, let's see if I have any idea what's going on, the second line skips a line if it starts with a #?
Thanks!
| [reply] |
|
foreach my $line (@array) {
assigns the value from the next item in @array to the variable $line. I find this makes it eaiser to see what you are doing, rather than using $_, which I find non-intuitive.
Yes, you are correct, that next line of code skips anything that starts with a #, so that only lines from your file that are not comments are inserted into the array that you are intending to send to the output file.
| [reply] [d/l] |
Re: File read and strip
by johngg (Canon) on Nov 14, 2006 at 20:48 UTC
|
A minor point to bear in mind is that the comments you are removing might be indented to align with indented code. The regular expression m{^#} will only match comments that are right at the beginning of the line because the caret ^ is the metacharacter used in regexen to anchor the match to the beginning of a string. (To anchor a match to the end of a string use the dollar $). This modified pattern copes with optional spaces before the hash
m{^\s*#}
The \s means a white-space character and the * iterator means zero or more of whatever preceded it. So the above pattern means at the beginning of the string (your line in this context) match zero or more white-space characters followed by a hash.
I hope this is of use. Cheers, JohnGG | [reply] [d/l] [select] |
Re: File read and strip
by jonadab (Parson) on Nov 15, 2006 at 01:00 UTC
|
Don't mind me, I'm just in a bit of a weird mood, so pardon me while I have a little twisted fun at the expense of your code...
Ah. Hopefully that's out of my system now.
So, then, on to fixing it... First off, the print
statement is never happening because <FILE>
does not return anything very useful when the
file is opened for output. Second, the test in
grep is probably backwards from what you want.
use strict;
use warnings;
my $file=<>;
open FILE, '<', $file;
my @new = map { grep { not m/^#/ } } <FILE>;
close FILE;
my $nfile=<>;
open FILE, '>', $nfile;
print FILE map{" \n\n $_ \n\n "}@new for 1, <FILE>;
This is still not optimal. For one thing, the first map (corresponding to your first while loop in the original code) is obviously superfluous. (Your while loop in the original wasn't necessary either, but it's more obvious to me now.) Second, there's still the weird double-indirection surrounding the filenames, which others have commented about. Third, I prepended a true value to force the print statement to occur once, but the file read there is still nonsense. Finally, there are still more variables than are strictly necessary; some of them are only used once -- well, twice: once in an assignment, and once again on the very next line to use the value; this amounts basically to using them once to turn one line into two, and that isn't necessary...
use strict;
use warnings;
open FILE, '<', scalar <>;
my @new = grep { not m/^#/ } <FILE>;
close FILE;
open FILE, '>', scalar <>;
print FILE map{" \n\n $_ \n\n "}@new;
If you're willing to add a second filehandle you can get rid of the array, and the whole thing could be reduced to more-or-less a one-liner...
use strict;
use warnings;
open INFILE, '<', scalar <>;
open OUTFILE, '>', scalar <>;
print OUTFILE map{" \n\n $_ \n\n "} grep { not m/^#/ } <INFILE>;
Look, ma, no variables to declare. If you're willing to use standard input and output and let the user specify the filenames with redirection operators, it becomes a veritable one-liner:
print map{" \n\n $_ \n\n "}grep{not m/^#/}<STDIN>;
strict and warnings seem unnecessary on anything that short. Someone will probably come along in a moment and explain how to use command-line flags to shorten it even further.
Sanity? Oh, yeah, I've got all kinds of sanity. In fact, I've developed whole new kinds of sanity. You can just call me "Mister Sanity". Why, I've got so much sanity it's driving me crazy.
| [reply] [d/l] [select] |
Re: File read and strip
by Jasper (Chaplain) on Nov 15, 2006 at 10:44 UTC
|
I think the main problem with the OPs code was that he was doing the exact opposite of what he wanted, but no one really seems to have pointed that out explicitly, it seems to me.
for(@array){
($i)= grep { m/^#/ } $_;
push @new, $i;
}
dumps all the comments in @new, not the lines without comments. The corrected solution has been shown, of course.
Also, the OP is reading the file line by line in one loop, then extracting from that file into an array in another loop, then looping over that array to print to the outfile. This can all be done in one loop (like the one liner)
while (<INFILE>) {
next if /^#/;
print OUTFILE $_;
}
I'd also add in a whitespace element to the regex, because I rarely start comments at the beginning of the line - I usually indent them with the rest of the code, and I doubt I'm the only one. so /^\s+#/
Jasper
also:
perl -ne '/^#/ or print'
perl -pe '$_ = "" if /^#/'
perl -pe '$_ x= !/^#/'
| [reply] [d/l] [select] |
|
|