File read and strip

Andrew_Levenson has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: File read and strip by ikegami (Patriarch) on Nov 14, 2006 at 19:58 UTC
You are attempting to read (`while(<FILE>){`) from your output file. You'll never reach the `print`. `push @new, $i;` adds undef for the lines `grep` matched. `($i)= grep { m/^#/ } $_;` keeps the comment lines instead of the lines without comments. `for(@array){ ($i)= grep { m/^#/ } $_; push @new, $i; }` should be `@new = grep { !m/^#/ } @array;` You don't check if your `open` succeeded `open(FILE, "<$file") or die("Unable to open input file \"$file\": $!\n");` `open(FILE, ">$nfile") or die("Unable to create output file \"$nfile\": $!\n");` You should use the 3-arg open whenever possible. `open(FILE, '<', $file)` `open(FILE, '>', $nfile)` You should use lexicals whenever possible. `open(my $fh_in, '<', $file)` (`while (<$fh_in>)`) `open(my $fh_out, '>', $nfile)` (`print $fh_out ...`) Shouldn't `$file=<>;` be `$file=@ARGV[0];`? Shouldn't `$nfile=<>;` be `$nfile=@ARGV[1];`? Parameters are usually passed as parameters instead of via STDIN. If you do mean to read from STDIN, give the user a prompt, and use `<STDIN>` instead of `<>`. No checking to see if `$file` and `$nfile` were specified. (They could be undef.) The trailing newline is not removed from `$file` and `$nfile`. Why read the whole file into memory? Fix: (one-liner) ( Added missing `>`. Thanks Hofmator. ) `perl -ne "print unless /^#/" infile > outfile` [download] Fix: (program -- memory efficient) `use strict; use warnings; die("usage: $0 {infile} {outfile}\n"; if @ARGV != 2; my ($file_in, $file_out) = @ARGV; open(my $fh_in, '<', $file_in) or die("Unable to open input file \"$file_in\": $!\n"); open(my $fh_out, '>', $file_out) or die("Unable to create output file \"$file_out\": $!\n"); while (<$fh_in>) { if (!/^#/) { print $fh_out $_; } }` [download] Fix: (program -- (more) compact) `use strict; use warnings; die("usage: $0 {filein} {fileout}\n"; if @ARGV != 2; my ($file_in, $file_out) = @ARGV; open(my $fh_in, '<', $file_in) or die("Unable to open input file \"$file_in\": $!\n"); open(my $fh_out, '>', $file_out) or die("Unable to create output file \"$file_out\": $!\n"); print $fh_out (grep !/^#/, <$fh_in>);` [download]	[reply] [d/l] [select]
Re^2: File read and strip by Hofmator (Curate) on Nov 15, 2006 at 07:18 UTC
Fix: (one-liner) Should be `perl -ne "print unless /^#/" infile > outfile`. Apart from that all very sound advice, ikegami++! -- Hofmator Code written by Hofmator and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.	[reply] [d/l]
Re: File read and strip by Joost (Canon) on Nov 14, 2006 at 19:50 UTC
`my $file=<>;` [download] Will set $file to the first line (including the newline) read from either STDIN or the file specified on the command line. This is almost certainly not what you want. You probably want `my $file = shift @ARGV; # get first argument my $nfile = shift @ARGV; # get next argument` [download] Which, outside of subroutines can be abbreviated to `my $file = shift; my $nfile = shift;` [download] or you can use `my ($file,$nfile) = @ARGV;` [download] By the way, it's always a good idea to check open() for errors: `open FILE,">$file" or die "Can't open $file: $!"` [download] update: there are other errors in your code, too. Note that grep() returns a LIST, for one. "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply] [d/l] [select]
Re^2: File read and strip by Andrew_Levenson (Hermit) on Nov 14, 2006 at 19:56 UTC
Is returning a list bad? What does that mean in relation to what i'm trying to do? (I don't mean to be obnoxious, i'm trying to make it easier by not just saying "I don't get it.")	[reply]
Re^3: File read and strip by Joost (Canon) on Nov 14, 2006 at 20:11 UTC
grep is normally used to return a list, like this: `my @new_list = grep { /regular expression/ } @list;` [download] now, @new_list contains every value in @list that matches /regular expression/. Your code: `($i)= grep { m/^#/ } $_; push @new, $i;` [download] Will set $i to `undefined` if $_ doesn't start with a # character, and then push it onto @new. In other words you'll have undefined entries in @new where you probably wanted to skip those lines. You might want to do some reading in the excellent perl documentation updated: fixed link "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply] [d/l] [select]
Re^2: File read and strip by Andrew_Levenson (Hermit) on Nov 14, 2006 at 19:54 UTC
I'm sorry, but... what? The files are user-input, so the `$file=<>` has worked fine for me in the past. Sorry, but I have no clue what else you said besides that. :( What does `shift @ARGV` do? What is it calling from? what does the $! in the die function do? Sorry if these are stupid questions, and thanks.	[reply] [d/l] [select]
Re^3: File read and strip by Joost (Canon) on Nov 14, 2006 at 20:03 UTC
shift @ARGV removes the first element from @ARGV and returns it. @ARGV contains all the arguments provided to your program. I was asssuming you wanted to call your program like this: `my_program inputfile outputfile` [download] anyway, `my $whatever = <>;` [download] will read a single line from the file(s) as specified in @ARGV. in other words, $whatever will not be the filename but the next (or first) line of the file you passed as an argument. if you don't pass any arguments `<>` will read from STDIN, which means it will usually wait for you to enter a line. See perlop (quoted below:) Note that lines read usually end in a newline character ("\n") while filenames usually do not. `while (<>) { ... # code for each line } is equivalent to the following Perl-like pseudo code: unshift(@ARGV, �-�) unless @ARGV; while ($ARGV = shift) { open(ARGV, $ARGV); while (<ARGV>) { ... # code for each line } }` [download] $! will contain the last filesystem error, which will make it a lot easier to see what's going wrong if you pass the wrong filename to open(), or don't have permissions etc. See also perlvar. "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply] [d/l] [select]
Re^3: File read and strip by runrig (Abbot) on Nov 14, 2006 at 20:12 UTC
The files are user-input, so the $file=<> has worked fine for me in the past. I think he was assuming that the file name was coming from the command line. @ARGV is the array of command line arguments, so "shift @ARGV" removes and returns the first argument. If you want the user to enter a file name, then it would be a good idea to prompt for it, e.g.: `print "Enter filename: "; chomp(my $file = <STDIN>);` [download] what does the $! in the die function do? See perlvar. It is the error message for system/library calls. You want to check to see if open fails, and if it does, see why it failed.	[reply] [d/l]
Re: File read and strip by rhesa (Vicar) on Nov 14, 2006 at 20:05 UTC
Your biggest problem is in the last `while(<FILE>)` loop: you're trying to read from a file that you just opened for writing. The other problem is with the grep: you're only collecting lines that start with a #, although you say you want to do the opposite. You seem to be doing a lot of unnecessary looping too. I suggest the following as a more streamlined alternative: `use strict; use warnings; print "Input file name? "; my $name_in = <>; chomp $name_in; print "output file name? "; my $name_out = <>; chomp $name_out; open my $in, '<', $name_in or die "error opening $name_in: $!"; open my $out , '>', $name_out or die "error opening $name_out: $!"; print $out grep { !/^#/ } <$in>;` [download]	[reply] [d/l] [select]
Re: File read and strip by GrandFather (Saint) on Nov 14, 2006 at 20:04 UTC
First thing I notice is I didn't notice $i being declared. Stacking up lines like that hides stuff. In this case it's even worse because $i should be declared in a much small scope - the for loop. Next thing is that generally three prarameter open is prefered and open should always be checked: `open FILE, '<', $file or die "Failed to open $file: $!";` [download] A side issue is: `($i)= grep { m/^#/ } $_;` [download] The best you can hope for from that is $i will be undef if $_ starts with a #. That undef gets pushed into @new however. Probably what you really wanted was something like: `! m/^#/ and push @new, $_ for @array;` [download] However your real issue is the while loop. You've just created FILE. It's empty. How many times it `while (<FILE>)` going to iterate? That is, how many of the 0 lines available in the newly created file is it going to read? The while loop is not needed at all. The for loop inside it could be replaced by: `print FILE " \n\n $_ \n\n " for @new;` [download] or remain as it is. DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re: File read and strip by markh (Scribe) on Nov 14, 2006 at 19:56 UTC
I think the problem is in your section of: `for(@array){ ($i)= grep { m/^#/ } $_; push @new, $i; }` [download] What about replacing that with: `foreach my $line (@array) { next if ($line =~ /^#/); push @new, $line; }` [download] Mine might be a bit less compact, but it is much more readable and easier to see what it happening there. Also it appears to me that when you are writing back to your file, you are putting in 4 extra linebreaks. Is that intentional?	[reply] [d/l] [select]
Re^2: File read and strip by Andrew_Levenson (Hermit) on Nov 14, 2006 at 20:00 UTC
The line breaks are intentional to make the final file more readable when I format it. What does using the {parameter?} "my $line" before the (@array) do? And, let's see if I have any idea what's going on, the second line skips a line if it starts with a #? Thanks!	[reply]
Re^3: File read and strip by markh (Scribe) on Nov 14, 2006 at 20:30 UTC
The line: `foreach my $line (@array) {` [download] assigns the value from the next item in @array to the variable $line. I find this makes it eaiser to see what you are doing, rather than using $_, which I find non-intuitive. Yes, you are correct, that next line of code skips anything that starts with a #, so that only lines from your file that are not comments are inserted into the array that you are intending to send to the output file.	[reply] [d/l]
Re: File read and strip by johngg (Canon) on Nov 14, 2006 at 20:48 UTC
A minor point to bear in mind is that the comments you are removing might be indented to align with indented code. The regular expression `m{^#}` will only match comments that are right at the beginning of the line because the caret `^` is the metacharacter used in regexen to anchor the match to the beginning of a string. (To anchor a match to the end of a string use the dollar `$`). This modified pattern copes with optional spaces before the hash `m{^\s#}` [download] The `\s` means a white-space character and the `` iterator means zero or more of whatever preceded it. So the above pattern means at the beginning of the string (your line in this context) match zero or more white-space characters followed by a hash. I hope this is of use. Cheers, JohnGG	[reply] [d/l] [select]
Re: File read and strip by jonadab (Parson) on Nov 15, 2006 at 01:00 UTC
Don't mind me, I'm just in a bit of a weird mood, so pardon me while I have a little twisted fun at the expense of your code... Read more... (465 Bytes) Ah. Hopefully that's out of my system now. So, then, on to fixing it... First off, the print statement is never happening because <FILE> does not return anything very useful when the file is opened for output. Second, the test in grep is probably backwards from what you want. `use strict; use warnings; my $file=<>; open FILE, '<', $file; my @new = map { grep { not m/^#/ } } <FILE>; close FILE; my $nfile=<>; open FILE, '>', $nfile; print FILE map{" \n\n $_ \n\n "}@new for 1, <FILE>;` [download] This is still not optimal. For one thing, the first map (corresponding to your first while loop in the original code) is obviously superfluous. (Your while loop in the original wasn't necessary either, but it's more obvious to me now.) Second, there's still the weird double-indirection surrounding the filenames, which others have commented about. Third, I prepended a true value to force the print statement to occur once, but the file read there is still nonsense. Finally, there are still more variables than are strictly necessary; some of them are only used once -- well, twice: once in an assignment, and once again on the very next line to use the value; this amounts basically to using them once to turn one line into two, and that isn't necessary... `use strict; use warnings; open FILE, '<', scalar <>; my @new = grep { not m/^#/ } <FILE>; close FILE; open FILE, '>', scalar <>; print FILE map{" \n\n $_ \n\n "}@new;` [download] If you're willing to add a second filehandle you can get rid of the array, and the whole thing could be reduced to more-or-less a one-liner... `use strict; use warnings; open INFILE, '<', scalar <>; open OUTFILE, '>', scalar <>; print OUTFILE map{" \n\n $_ \n\n "} grep { not m/^#/ } <INFILE>;` [download] Look, ma, no variables to declare. If you're willing to use standard input and output and let the user specify the filenames with redirection operators, it becomes a veritable one-liner: `print map{" \n\n $_ \n\n "}grep{not m/^#/}<STDIN>;` strict and warnings seem unnecessary on anything that short. Someone will probably come along in a moment and explain how to use command-line flags to shorten it even further. Sanity? Oh, yeah, I've got all kinds of sanity. In fact, I've developed whole new kinds of sanity. You can just call me "Mister Sanity". Why, I've got so much sanity it's driving me crazy.	[reply] [d/l] [select]
Re: File read and strip by Jasper (Chaplain) on Nov 15, 2006 at 10:44 UTC
I think the main problem with the OPs code was that he was doing the exact opposite of what he wanted, but no one really seems to have pointed that out explicitly, it seems to me. `for(@array){ ($i)= grep { m/^#/ } $_; push @new, $i; }` [download] dumps all the comments in @new, not the lines without comments. The corrected solution has been shown, of course. Also, the OP is reading the file line by line in one loop, then extracting from that file into an array in another loop, then looping over that array to print to the outfile. This can all be done in one loop (like the one liner) `while (<INFILE>) { next if /^#/; print OUTFILE $_; }` [download] I'd also add in a whitespace element to the regex, because I rarely start comments at the beginning of the line - I usually indent them with the rest of the code, and I doubt I'm the only one. so `/^\s+#/` Jasper also: `perl -ne '/^#/ or print' perl -pe '$_ = "" if /^#/' perl -pe '$_ x= !/^#/'` [download]	[reply] [d/l] [select]