Cleaner way of looping through a file and stripping only certain lines?

texasperl has asked for the wisdom of the Perl Monks concerning the following question:

Hello my esteemed Monks, I have once again come to seek your wisdom in making my Perl more... Perl-ish. I have the following working code that will strip any MX records greater than 8 from a file (tinydns format). However, looking at the code, I feel that there is likely a shorter and more concise way to write it. I suspect some use of map or grep would be more efficient, however, I am not entirely familiar and/or comfortable with those functions. Hopefully through a Monk or two sharing his/her wisdom, I'll be able to write faster, better code (and understand it, too.) Also, I'm curious as to how one would go about editing the file in place, i.e., not having to create a new outfile.
Without further ado, here is the code.

#!/usr/bin/perl -w
use strict;

my $ifile = shift;

open IFILE, '<', "$ifile" || die "Couldn't open file: $!";
open OFILE, '+>', 'new-data' || die "Couldn't open outfile: $!";

while (my $line = <IFILE>) {
        # is the line an MX record?
        if ($line =~ /^@.*mail(\d+).*\z/xms) {
                # is it less than or equal to 8?
                if ($1 <= 8) {
                        print OFILE $line;
                }
        }
        # print everything else to the new file
        else { print OFILE $line; }
}

close IFILE;
close OFILE;
[download]

The data is in this format:
@*.somedomain.net::mail7.somedomain.net:10:21600
@somedomain.net::mail7.somedomain.net:10:21600
Thanks again, Monks!

Comment on Cleaner way of looping through a file and stripping only certain lines? Download Code

Replies are listed 'Best First'.

Re: Cleaner way of looping through a file and stripping only certain lines?
by liverpole (Monsignor) on Dec 08, 2006 at 16:59 UTC

If it's not a huge file, you could read it into memory and use map to iterate over all the lines:

#!/usr/bin/perl -w
use strict;

my $ifile = shift || die "No filename provided\n";
open IFILE, '<', "$ifile"    || die "Couldn't open file: $!";
open OFILE, '+>', 'new-data' || die "Couldn't open outfile: $!";
chomp(my @lines = <IFILE>);
close IFILE;

my @matches = map { /^@.*mail(\d+).*\z/? $1 <= 8? $1: ( ): $_ } @lines
+;

map { print OFILE "$_\n" } @matches;
close OFILE;
[download]

But I don't see anything wrong with the way you did it, except that I would recommend error-checking for the case where a filename isn't passed to the program.

Update: Fixed to take skip saving anything in the case where the captured pattern isn't <= 8.

Update 2: Cleaned up syntax further.

s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/

Re^2: Cleaner way of looping through a file and stripping only certain lines?

by texasperl (Sexton) on Dec 08, 2006 at 17:08 UTC

This is exactly the kind of thing I was talking about. Perfect!++ Thanks!

Re: Cleaner way of looping through a file and stripping only certain lines?
by ikegami (Patriarch) on Dec 08, 2006 at 17:18 UTC

open IFILE, '<', "$ifile" || die "Couldn't open file: $!";
is buggy. Due to the operator order or precedence, it's equivalent to
open IFILE, '<', ("$ifile" || die "Couldn't open file: $!");
You want one of the following instead
open IFILE, '<', "$ifile" or die "Couldn't open file: $!";
open(IFILE, '<', "$ifile") || die "Couldn't open file: $!";
open(IFILE, '<', "$ifile") or die "Couldn't open file: $!";
(open IFILE, '<', "$ifile") || die "Couldn't open file: $!";
(open IFILE, '<', "$ifile") or die "Couldn't open file: $!";
Same goes for the second open.
Why "$ifile" instead of just $ifile?
Why '+>' instead of just '>'?
/^@.*mail(\d+).*\z/xms
can be simplified to
/^@.*mail(\d+)/xms
All three modifiers (xms) could be removed from the match operator, but they cause no harm here.
The user doesn't need to see the program line number when he specifies a bad file name. If the error message is not good enough to identify a user error without resorting to a line number, it needs to be improved.

With changes applied:

#!/usr/bin/perl -w
use strict;

my $ifile = shift;
my $ofile = 'new_data';

open my $ifh, '<', $ifile
   or die "Couldn't open DNS file \"$ifile\": $!\n";
open my $ofh, '>', $ofile
   or die "Couldn't create output file \"$ofile\": $!\n";

while (my $line = <$ifh>) {
   # is the line an MX record?
   if ($line =~ /^@.*mail(\d+)/xms) {
      # is it less than or equal to 8?
      if ($1 <= 8) {
         print $ofh $line;
      }
   }
   # print everything else to the new file
   else {
      print $ofh $line;
   }
}
[download]

You could make your program sipler and more flexible by using STDIN and STDOUT.

#!/usr/bin/perl -w
use strict;

#
# Usage:
#    fixdns infile > outfile
#
# Usage for in place editing:
#    perl -i fixdns dnsfile
#

while (<>) {
   # is the line an MX record?
   if (/^@.*mail(\d+)/xms) {
      # is it less than or equal to 8?
      if ($1 <= 8) {
         print;
      }
   }
   # print everything else to the new file
   else {
      print;
   }
}
[download]

[reply]
[d/l]
[select]

Re^2: Cleaner way of looping through a file and stripping only certain lines?

by texasperl (Sexton) on Dec 08, 2006 at 18:05 UTC

not using "" around the variable in open
operator precedence in the open statement
using STDIN and STDOUT instead of named filehandles

Re^3: Cleaner way of looping through a file and stripping only certain lines?

by revdiablo (Prior) on Dec 08, 2006 at 18:17 UTC

operator precedence in the open statement

Not trying to kick you while you're down (honestly! =]), and it could just be a case of me misreading your sentence, but the operator precedence ikegami explained is not limited to the open statement -- it's part of the perl parser in general.

Update: Ah, misread the original sentence indeed. Sorry 'bout that. No harm intended.

Re^4: Cleaner way of looping through a file and stripping only certain lines?

by texasperl (Sexton) on Dec 08, 2006 at 18:20 UTC

Re^5: Cleaner way of looping through a file and stripping only certain lines?

by ikegami (Patriarch) on Dec 08, 2006 at 19:10 UTC

Re^2: Cleaner way of looping through a file and stripping only certain lines?

by duckyd (Hermit) on Dec 08, 2006 at 20:01 UTC

open my $input_fh, '<', "$ifile" or die "Couldn't open file: $!";
[download]

$input_fh

IFILE

[reply]
[d/l]
[select]

Re: Cleaner way of looping through a file and stripping only certain lines?
by grep (Monsignor) on Dec 08, 2006 at 17:03 UTC

I'm curious as to how one would go about editing the file in place, i.e., not having to create a new outfile.

Open the file. Read the file into an array. Close the file. Open the same file for overwrite and write what you want back.

open(IN, '<', $ifile) or die "Couldn't open file: $!\n";
my @data = <IN>;
close IN;

open(OUT, '>', $ifile) or die "Couldn't open outfile: $!\n";
foreach my $line (@data) {
    ### Do what you want
    print OUT "$line";
}
[download]

grep

XP matters not. Look at me. Judge me by my XP, do you?

Re^2: Cleaner way of looping through a file and stripping only certain lines?

by throop (Chaplain) on Dec 08, 2006 at 17:14 UTC

>>I'm curious as to how one would go about editing the file in place, i.e., not having to create a new outfile.
>Open the file. Read the file into an array. Close the file. Open the same file for overwrite and write what you want back.

This works, but isn't it inherently risky for cases where there's a power failure or other hiccup during processing — you corrupt your original?

throop

Re^3: Cleaner way of looping through a file and stripping only certain lines?

by grep (Monsignor) on Dec 08, 2006 at 17:34 UTC

A safer answer would be to use File::Temp. You still have a problem with power failure but your window of problems is smaller.

## UNTESTED
use File::Temp 'tempfile';
use File::Copy;
my ($tmp_FH,$tmp_fn) = tempfile();

open(IN, '<', $ifile) or die "Couldn't open file: $!\n";
my @data = <IN>;
close IN;

foreach my $line (@data) {
    ### Do what you want
    print $tmp_FH "$line";
}
copy($tmp_fn,$ifile);
[download]

grep

XP matters not. Look at me. Judge me by my XP, do you?

Re^3: Cleaner way of looping through a file and stripping only certain lines?

by jbert (Priest) on Dec 08, 2006 at 18:22 UTC

The rename is an atomic operation. It's guaranteed to succeed completely or not at all (on Unix-like boxes), so you can never lose data as a result of a power failure.

This is a similar trick to perl -i, but I think that does a rename of the original file and then writes back into the original file, which still leaves open the case that you could have bad (partially written) data in the original file on power failure.

Re: Cleaner way of looping through a file and stripping only certain lines?
by hiseldl (Priest) on Dec 08, 2006 at 21:31 UTC

I did not see the requirement that you wanted a script, so, here's how I would do it...

You can use the '-i' switch to edit the file in-place. See perlrun for reference.

If you do not want to create a backup file...

$ perl -ni -e '$match=/::mail(\d+)/;print if!$match||$1>7' data

If you do want to create a backup file with extension '.bak'...

perl -ni.bak -e '$match=/::mail(\d+)/;print if!$match||$1>7' data

This assumes that the file containing the data is named 'data'.

Here's the nuts-and-bolts explanation, skip it if you already know how it works.

Basically, this will check every line to see if it matches the regex, in your case you wanted to match something like /::mail(\d+)/ to capture the digits (please put whatever regexp you need in there, this is untested and for example purposes only). The match operator is used for two purposes, (1) letting us know if it matched, and (2) capturing the digits of interest if it does match. Here, I store the boolean value in $match for later usage. The second statement is the conditional print, which will print the current line if it did not match or if the value of the captured digits is greater than 7.

Please change it according to your needs.

HTH.

--
hiseldl
What time is it? It's Camel Time!

[reply]
[d/l]
[select]

Re: Cleaner way of looping through a file and stripping only certain lines?
by mreece (Friar) on Dec 09, 2006 at 17:06 UTC

egrep -v '@.*mail(9|[1-9][0-9])' < in-file > out-file
[download]

Back to Seekers of Perl Wisdom