Re: Cleaner way of looping through a file and stripping only certain lines?
by liverpole (Monsignor) on Dec 08, 2006 at 16:59 UTC
|
Hi texasperl,
If it's not a huge file, you could read it into memory and use map to iterate over all the lines:
#!/usr/bin/perl -w
use strict;
my $ifile = shift || die "No filename provided\n";
open IFILE, '<', "$ifile" || die "Couldn't open file: $!";
open OFILE, '+>', 'new-data' || die "Couldn't open outfile: $!";
chomp(my @lines = <IFILE>);
close IFILE;
my @matches = map { /^@.*mail(\d+).*\z/? $1 <= 8? $1: ( ): $_ } @lines
+;
map { print OFILE "$_\n" } @matches;
close OFILE;
But I don't see anything wrong with the way you did it, except that I would recommend error-checking for the case where a filename isn't passed to the program.
Update: Fixed to take skip saving anything in the case where the captured pattern isn't <= 8.
Update 2: Cleaned up syntax further.
s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
| [reply] [d/l] |
|
This is exactly the kind of thing I was talking about. Perfect!++
Thanks!
| [reply] |
Re: Cleaner way of looping through a file and stripping only certain lines?
by ikegami (Patriarch) on Dec 08, 2006 at 17:18 UTC
|
-
open IFILE, '<', "$ifile" || die "Couldn't open file: $!";
is buggy. Due to the operator order or precedence, it's equivalent to
open IFILE, '<', ("$ifile" || die "Couldn't open file: $!");
You want one of the following instead
open IFILE, '<', "$ifile" or die "Couldn't open file: $!";
open(IFILE, '<', "$ifile") || die "Couldn't open file: $!";
open(IFILE, '<', "$ifile") or die "Couldn't open file: $!";
(open IFILE, '<', "$ifile") || die "Couldn't open file: $!";
(open IFILE, '<', "$ifile") or die "Couldn't open file: $!";
-
Same goes for the second open.
-
Why "$ifile" instead of just $ifile?
-
Why '+>' instead of just '>'?
-
/^@.*mail(\d+).*\z/xms
can be simplified to
/^@.*mail(\d+)/xms
All three modifiers (xms) could be removed from the match operator, but they cause no harm here.
-
The user doesn't need to see the program line number when he specifies a bad file name. If the error message is not good enough to identify a user error without resorting to a line number, it needs to be improved.
With changes applied:
#!/usr/bin/perl -w
use strict;
my $ifile = shift;
my $ofile = 'new_data';
open my $ifh, '<', $ifile
or die "Couldn't open DNS file \"$ifile\": $!\n";
open my $ofh, '>', $ofile
or die "Couldn't create output file \"$ofile\": $!\n";
while (my $line = <$ifh>) {
# is the line an MX record?
if ($line =~ /^@.*mail(\d+)/xms) {
# is it less than or equal to 8?
if ($1 <= 8) {
print $ofh $line;
}
}
# print everything else to the new file
else {
print $ofh $line;
}
}
You could make your program sipler and more flexible by using STDIN and STDOUT.
#!/usr/bin/perl -w
use strict;
#
# Usage:
# fixdns infile > outfile
#
# Usage for in place editing:
# perl -i fixdns dnsfile
#
while (<>) {
# is the line an MX record?
if (/^@.*mail(\d+)/xms) {
# is it less than or equal to 8?
if ($1 <= 8) {
print;
}
}
# print everything else to the new file
else {
print;
}
}
| [reply] [d/l] [select] |
|
This is some great insight. I especially gleaned a lot of wisdom from these bits:
- not using "" around the variable in open
- operator precedence in the open statement
- using STDIN and STDOUT instead of named filehandles
Thank you very much for your excellent insight. Hopefully someone else can also learn from this post (that's why I post things like this.)
| [reply] |
|
operator precedence in the open statement
Not trying to kick you while you're down (honestly! =]), and it could just be a case of me misreading your sentence, but the operator precedence ikegami explained is not limited to the open statement -- it's part of the perl parser in general.
Update: Ah, misread the original sentence indeed. Sorry 'bout that. No harm intended.
| [reply] |
|
|
|
I would add that you can avoid the use of a bareword filehandle by calling open with an undef scalar instead:
open my $input_fh, '<', "$ifile" or die "Couldn't open file: $!";
now use $input_fh just like you would have used IFILE
| [reply] [d/l] [select] |
Re: Cleaner way of looping through a file and stripping only certain lines?
by grep (Monsignor) on Dec 08, 2006 at 17:03 UTC
|
I'm curious as to how one would go about editing the file in place, i.e., not having to create a new outfile.
Open the file. Read the file into an array. Close the file. Open the same file for overwrite and write what you want back.
open(IN, '<', $ifile) or die "Couldn't open file: $!\n";
my @data = <IN>;
close IN;
open(OUT, '>', $ifile) or die "Couldn't open outfile: $!\n";
foreach my $line (@data) {
### Do what you want
print OUT "$line";
}
UPDATE: Fixed some C&P errs
grep
XP matters not. Look at me. Judge me by my XP, do you? |
| [reply] [d/l] |
|
| [reply] |
|
## UNTESTED
use File::Temp 'tempfile';
use File::Copy;
my ($tmp_FH,$tmp_fn) = tempfile();
open(IN, '<', $ifile) or die "Couldn't open file: $!\n";
my @data = <IN>;
close IN;
foreach my $line (@data) {
### Do what you want
print $tmp_FH "$line";
}
copy($tmp_fn,$ifile);
grep
XP matters not. Look at me. Judge me by my XP, do you? |
| [reply] [d/l] |
|
The safer way to do this sort of thing is to write to a new file and then rename over the file you want to replace. This is what you do when handling mbox files for instance.
The rename is an atomic operation. It's guaranteed to succeed completely or not at all (on Unix-like boxes), so you can never lose data as a result of a power failure.
This is a similar trick to perl -i, but I think that does a rename of the original file and then writes back into the original file, which still leaves open the case that you could have bad (partially written) data in the original file on power failure.
| [reply] [d/l] |
Re: Cleaner way of looping through a file and stripping only certain lines?
by hiseldl (Priest) on Dec 08, 2006 at 21:31 UTC
|
I did not see the requirement that you wanted a script, so, here's how I would do it...
You can use the '-i' switch to edit the file in-place. See perlrun for reference.
If you do not want to create a backup file...
$ perl -ni -e '$match=/::mail(\d+)/;print if!$match||$1>7' data
If you do want to create a backup file with extension '.bak'...
perl -ni.bak -e '$match=/::mail(\d+)/;print if!$match||$1>7' data
This assumes that the file containing the data is named 'data'.
Here's the nuts-and-bolts explanation, skip it if you already know how it works.
Basically, this will check every line to see if it matches the regex, in your case you wanted to match something like /::mail(\d+)/ to capture the digits (please put whatever regexp you need in there, this is untested and for example purposes only). The match operator is used for two purposes, (1) letting us know if it matched, and (2) capturing the digits of interest if it does match. Here, I store the boolean value in $match for later usage.
The second statement is the conditional print, which will print the current line if it did not match or if the value of the captured digits is greater than 7.
Please change it according to your needs.
HTH.
| [reply] [d/l] [select] |
Re: Cleaner way of looping through a file and stripping only certain lines?
by mreece (Friar) on Dec 09, 2006 at 17:06 UTC
|
here's a unixy non-perl solution:
egrep -v '@.*mail(9|[1-9][0-9])' < in-file > out-file
| [reply] [d/l] |