Removing lines from files

learningperl01 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Removing lines from files by markkawika (Monk) on May 09, 2008 at 22:33 UTC
Perl is a poor choice for this operation. I would use a bourne shell script with standard unix utilities (assuming you're doing this on a unix box): `#!/bin/sh DIR=$1 cd ${DIR} find . -name '33dc01.*outer?log' -print \| \ while read fn do tail +5 ${fn} > ${fn}.deleting mv ${fn}.deleting ${fn} done` [download]	[reply] [d/l]
Re^2: Removing lines from files by Anonymous Monk on May 13, 2008 at 15:49 UTC
Using the code below how would I modify it so that only one file gets processed at a time. Meaning make the script wait until file1 is done with the removal/backup/deletion before it starts with file2? thanks again for the help `#!/usr/bin/perl use warnings; use strict; use File::Find; @ARGV == 1 or die "usage: $0 directory_name\n"; my @files; find sub { if ( -f and /^33dc01\..*outer\.log$/ ) { push @files, $File::Find::name; print "$File::Find::name\n"; } }, $ARGV[ 0 ]; ( $^I, @ARGV ) = ( '', @files ); while ( <> ) { next if $. <= 4; print; close ARGV if eof; }` [download]	[reply] [d/l]
Re: Removing lines from files by GrandFather (Saint) on May 09, 2008 at 22:13 UTC
Aside from your immediate problem there are a couple of style issues to consider that may help you in the future. First, always use strictures (use strict; use warnings;). Strictures catch a lot of silly typos and similar errors before they get a chance to waste a few hours trying to find subtle errors. Don't use prototypes for subroutines. While there are a small number of situations where they are useful, generally they don't do what you think and often do what you don't expect. In the case of your sample code the prototype is actually ignored in any case because it hasn't been seen before it is used! In the interests of showing you a little more Perl power consider: `sub edits { return unless -f and /^33dc01\..*outer.log$/; # Set up for in place edit local @ARGV = ($_); local $^I = '.bak'; print "$File::Find::name\n"; while (<>) { print if $. > 4; } }` [download] which uses Perl's in place edit facility to rewrite the file having skipped the first four lines. Note that this will create backups of the original files with .bak appended to the file name. The special variable $. provides the current line number for the file handle most recently accessed. See $^I, @ARGV, $. and $ARGV (which I didn't use, but may be of interest). Perl is environmentally friendly - it saves trees	[reply] [d/l]
Re: Removing lines from files by pc88mxer (Vicar) on May 09, 2008 at 19:19 UTC
Just replace this: `tie my @file, 'Tie::File', $File::Find::name or die "Can't + tie $File::Find::name $!"; splice @file, 0, 4; untie @file;` [download] with: `open(F, '<', $File::Find::name) or die "..."; my $tmp = $File::Find::name . " - new"; # see comment below open(G, '>', $tmp) or die "..."; for (1..4) { <F> }; # don't copy the first four lines while (<F>) { print G } close(G); close(F); rename($tmp, $File::Find::name) or warn "unable to replace $File::Find::name: $!\n";` [download] The only caveat is that you have to ensure that `$tmp` can never be the name of an existing log file. ~~The `tie` method is inefficient because it reads the entire file into memory.~~ One advantage that this approach has over an in-place re-write is that you won't have to worry about leaving yourself with a corrupted log file if the copy is interrupted.	[reply] [d/l] [select]
Re^2: Removing lines from files by Cristoforo (Curate) on May 09, 2008 at 20:09 UTC
In a PDF found here, "Lightweight Database Techniques" tutorial materials freed, Dominus says Tie::File is for convenience, not performance, but he also says its reasonably fast. The job you're doing here is removing the first 4 lines of every (200,000) files. Tie::File will have to rewrite every large file from the point beyond the change to the end. He says since the module must perform reasonable well for many different types of applications, its slower than code custom written for a single application.	[reply]
Re: Removing lines from files by NetWallah (Canon) on May 09, 2008 at 19:27 UTC
Have you tried doing `shift @file for 1..4;` [download] instead of the "splice" ? Not sure if that would help, but it might. Update: It does not. See pc88mxer's reply below. Why do you use a "foreach" loop over a scalar (foreach ( $File::Find::name )) ? It seems that $_ is not even referenced. in that loop. Although "Tie::File" claims to be efficient, you are not using it for "random, array-style access", so the overhead may be too high for your case. Benchmark can help find more optimal mechanisms. ++ on using modules to reduce the amount you are coding ! (Although this may appear contrary to my previous sentence). Update 1: pc88mxer : The Tie::File docs claim that it does NOT read the file into memory. Also, Your method (re-writing the relevant part of the file) is supposed to be LESS efficient than claimed by Tie::File. I will attempt to benchmark & post here. "How many times do I have to tell you again and again .. not to be repetitive?"	[reply] [d/l]
Re^2: Removing lines from files by pc88mxer (Vicar) on May 09, 2008 at 19:39 UTC
A check of the source code reveals that the `SHIFT` method for `Tie::File` is implemented in terms of the `SPLICE` method: `sub SHIFT { my $self = shift; scalar $self->SPLICE(0, 1); }` [download] Besides, the file is updated after every `shift` operation which means you'd be re-writing the file four times (!) I don't think `Tie::File` is the right approach to manipulate 700 MB log files. Update: I have a feeling the OP is running into out of memory problems because `Tie::File` is keeping track of the start of each line even though it doesn't need to. For a multi-megabyte file this clearly would be a problem. However, this is currently just a conjecture.	[reply] [d/l] [select]
Re: Removing lines from files by jwkrahn (Abbot) on May 09, 2008 at 21:50 UTC
`#!/usr/bin/perl use warnings; use strict; use File::Find; @ARGV == 1 or die "usage: $0 directory_name\n"; my @files; find sub { if ( -f and /^33dc01\..*outer\.log$/ ) { push @files, $File::Find::name; print "$File::Find::name\n"; } }, $ARGV[ 0 ]; ( $^I, @ARGV ) = ( '', @files ); while ( <> ) { next if $. <= 4; print; close ARGV if eof; }` [download]	[reply] [d/l]
Re^2: Removing lines from files by learningperl01 (Beadle) on May 13, 2008 at 14:25 UTC
Using the code below how would I modify it so that only one file gets processed at a time. Meaning make the script wait until file1 is done with the removal/backup/deletion before it starts with file2? thanks again for the help `#!/usr/bin/perl use warnings; use strict; use File::Find; @ARGV == 1 or die "usage: $0 directory_name\n"; my @files; find sub { if ( -f and /^33dc01\..*outer\.log$/ ) { push @files, $File::Find::name; print "$File::Find::name\n"; } }, $ARGV[ 0 ]; ( $^I, @ARGV ) = ( '', @files ); while ( <> ) { next if $. <= 4; print; close ARGV if eof;` [download] }	[reply] [d/l]
Re^3: Removing lines from files by jwkrahn (Abbot) on May 13, 2008 at 21:52 UTC
The way the code works only one line at a time will ever be in memory and the files are processed in the order they appear in the `@ARGV` array.	[reply] [d/l]