sergio has asked for the wisdom of the Perl Monks concerning the following question:

Oh Wise Monks! Can Perl apply a matching regex to an input stream instead of just a scalar? That would be trully a miracle!

For example one can do something like

while ( my ($m) = ($string =~ /<regex>/g) ) { <do something with $m> }

And a sequence of matches can be processed.

But can a similar pattern be applied to an input stream? Specially in the case where the match can overlap over several lines. That is the match operator would read the stream till something matches.

The advantage: no need to read the file in large chunks or completely.
The use: parsing large ammounts of multiline stuff.

Thank you for any wisdom!

Sergio

Replies are listed 'Best First'.
Re: Regexing an input stream...
by dws (Chancellor) on Jun 18, 2003 at 18:33 UTC
    Can Perl apply a matching regex to an input stream instead of just a scalar?

    Sort of. See Matching in huge files for one trick that works. It involves using the seldom-used /c modifier to coordinate with pos(), while shifting a pair of page buffers (and adjusting pos()) to get a sliding window through a stream.

Re: Regexing an input stream...
by particle (Vicar) on Jun 18, 2003 at 18:47 UTC

    perl 5.008 adds a new stream processing engine, PerlIO. you can define your own layers, too. see PerlIO::via:: for some existing modules.

    ~Particle *accelerates*

      I think you mean 5.8

      My bad. Thought you were referring to perl -v not $]

      -Lee

      "To be civilized is to deny one's nature."
        perldoc perlvar (look for $])

        MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
        I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
        ** The third rule of perl club is a statement of fact: pod is sexy.

Re: Regexing an input stream...
by BrowserUk (Patriarch) on Jun 18, 2003 at 19:04 UTC

    See Re: split and sysread() for a slightly different implementation of the sliding buffer technique.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


      Interesting. I'll take a look. Thanks.
Re: Regexing an input stream...
by nekron99_ (Acolyte) on Jun 18, 2003 at 21:48 UTC
    hmmm i have done the following
    while ( $line = <Stream> ) { if ( $line =~ m/blah/ ){ print "here\n"; } }
      You have not read the original post very carefully...
        he/she almost answered it.
        while ($line .= <Stream>) { exit if ( $line =~ m/b\n*l\n*a\n*h/m); }
Re: Regexing an input stream...
by Anonymous Monk on Jun 19, 2003 at 15:26 UTC
    Here is a lightly tested OO solution.
    package RegexInput; sub new{ my ($class,$file)=@_; my $fh; open $fh,$file; my $self = { FH=>$fh, leftover=>"" }; bless $self,$class; } sub get_line{ my $self=shift; my $regex=shift; my $fh=$self->{FH}; my $string=$self->{leftover}; while (<$fh>) { $string.=$_; if ($string=~/$regex/s) { $self->{leftover}=$'; return $string; } } $self->{leftover}=""; return $string; } package main; my $buf = new RegexInput($0); while (my $line=$buf->get_line(qr/\n\}/)) { print $line,"\n","="x50,"\n"; }
    OUTPUT:
    package RegexInput; sub new{ my ($class,$file)=@_; my $fh; open $fh,$file; my $self = { FH=>$fh, leftover=>"" }; bless $self,$class; } ================================================== sub get_line{ my $self=shift; my $regex=shift; my $fh=$self->{FH}; my $string=$self->{leftover}; while (<$fh>) { #print $_; $string.=$_; if ($string=~/$regex/s) { $self->{leftover}=$'; return $string; } } $self->{leftover}=""; return $string; } ================================================== package main; my $buf = new RegexInput($0); while (my $line=$buf->get_line(qr/\n\}/)) { print $line,"\n","="x50,"\n"; } ================================================== ==================================================
Re: Regexing an input stream... (tye)
by tye (Sage) on Jun 19, 2003 at 22:13 UTC