Re: Reading files, skipping very long lines...

I am uncertain whether I see what you mean. Just as an idea, would the following do what you want?

cat file | perl -le '$max=79; while(<>){print unless lentgh $_ > $max}'

If this would work for you, except from the memory issue, write the while loop properly with open first...
if that still fails, consider using Tie::File.

Cheers, Sören

Comment on Re: Reading files, skipping very long lines...

Replies are listed 'Best First'.
Re^2: Reading files, skipping very long lines... by Limbic~Region (Chancellor) on Sep 29, 2005 at 17:14 UTC
Happy-the-monk, I do not believe either one of these approaches will work (if I understand the problem correctly). Some lines are too long to read into a single variable so it is not possible to use length to determine if a line is too long. Using Tie::File would help since it only indexes where the newlines in the file begin, but you still need to read the whole line to determine if it is too long (`length $file[42] > 1024 * 1024`). I can see one way it may work though. If there is a way to get at the indices of the newlines, you would only have to subtract the 2 to determine if the line was too long. Cheers - L~R Update:The following is an untested proof-of-concept. `#!/usr/bin/perl use strict; use warnings; use Tie::File; my $obj = tie my @file, 'Tie::File', 'file.big' \|\| die "Unable to tie +'file.big': $!"; my $big = 1024 * 1024; for ( 0 .. $#file - 1 ) { my $beg = $obj->offset($_); my $end = $obj->offset($_ + 1); next if $end - $beg > $big; # process $file[$_]; } # Handle last line as special case my $beg = $obj->offset($#file); my $end = -s 'file.big'; if ( $end - $beg <= $big ) { # process $file[-1]; } #Cleanup undef $obj; untie @file;` [download]	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: Reading files, skipping very long lines...
by Limbic~Region (Chancellor) on Sep 29, 2005 at 17:14 UTC

Happy-the-monk

Tie::File

length $file[42] > 1024 * 1024

I can see one way it may work though. If there is a way to get at the indices of the newlines, you would only have to subtract the 2 to determine if the line was too long.

Cheers - L~R

Update:The following is an untested proof-of-concept.

#!/usr/bin/perl
use strict;
use warnings;

use Tie::File;

my $obj = tie my @file, 'Tie::File', 'file.big' || die "Unable to tie 
+'file.big': $!";
my $big = 1024 * 1024;

for ( 0 .. $#file - 1 ) {
    my $beg = $obj->offset($_);
    my $end = $obj->offset($_ + 1);
    next if $end - $beg > $big;
    # process $file[$_];
}

# Handle last line as special case
my $beg = $obj->offset($#file);
my $end = -s 'file.big';
if ( $end - $beg <= $big ) {
    # process $file[-1];
}

#Cleanup
undef $obj;
untie @file;
[download]

[reply]
[d/l]
[select]