in reply to text parsing question

Have you considered fixing the underlying problem: the bug limiting the size of your buf? Here's a solution if you can't.

If you remove the newlines, then you can just search for very long terms and replace them.

use strict; use warnings; my $file = '...'; open(my $fh, '<', $file) or die("Unable to open file \"$file\": $!\n"); my $reader = Reader->new($fh); while (defined(my $line = $reader->get_line())) { $line =~ s/\S{40,}/.../g; print($line); }
{ package Reader; sub new { my ($class, $fh) = @_; my $self = bless([$fh, undef]); } sub get_line { my ($self) = @_; our $fh; local *fh = \($self->[0]); our $buf; local *buf = \($self->[1]); if (!defined($fh)) { return undef; } if (!defined($buf)) { for (;;) { my $line = <$fh>; if (!defined($line)) { undef $fh; return undef; } if ($line =~ /^\d\d:\d\d:\d\d /) { $buf = $line; last; } } } for (;;) { my $line = <$fh>; if (!defined($line)) { undef $fh; return "$buf\n"; } if ($line =~ /^\d\d:\d\d:\d\d /) { return ((undef, $buf) = ($buf, $line))[0]; } chomp($buf); $buf .= $line; } } }

Output:

09:59:58 09/28/07 1 192.168.0.7 1.3.6.1.2.1.1.3.0 TimeTick 335604562 1 +.3.6.1.6.3.1.1.4.1.0 OID 1.3.6.1.4.1.9.9.383.0.1 1.3.6.1.4.1.9.9.383. +1.1.1 Counter64 1119125906 1.3.6.1.4.1.9.9.383.1.1.2 String 07d7091c0 +9361400 1.3.6.1.4.1.9.9.383.1.2.14 String ... 1.3.6.1.4.1.9.9 String +... 1.3.6.1.4.1.9.9.383.1.2.16 String 192.168.20.60:3089 1.3.6.1.4.1. +9.9.383.1.2.17 String osIdSource="unknown" osRelevance="relevant" osT +ype="unknown" 192.168.0.23:139

Replies are listed 'Best First'.
Re^2: text parsing question
by perlAffen (Sexton) on Oct 01, 2007 at 18:08 UTC
    can't 'fix' the buffer problem, it is something I can't touch. This works very well. Thanks. Now I need to figure out how it works so I may absorb the wisdom.

      Here are some notes to help you understand get_line:

      • !defined($buf) is only true the first time get_line is called. It proceeds to trash all the lines before the first timestamped line, if any.

      • The file is processed as follows:

        File Read by Returned by ----------------- ----------- ----------- line 1st call(*) scrapped line 1st call(*) scrapped timestamped line 1st call(*) 1st call line 1st call 1st call line 1st call 1st call timestamped line 1st call 2nd call line 2nd call 2nd call line 2nd call 2nd call timestamped line 2nd call 3rd call line 3rd call 3rd call line 3rd call 3rd call timestamped line 3rd call 4th call EOF 4th call 5th call

        * — By the body of if (!defined($buf)).

      • Between calls, $buf contains the line that has been read, but not returned.

      • our $var; local *var = \($self->[$idx]);
        creates an alias so that any change to $var is reflected in $self->[$idx].

      • for (;;) can be read as "for ever". The loop will loop until last, return, die, exit or other exceptional means are used to exit it.

      • return ((undef, $buf) = ($buf, $line))[0];
        is short for
        my $temp = $buf;
        $buf = $line;
        return $temp;