in reply to Regexes on Streams

Starting with tilly's idea, and attempting to generalise it, I came up with this.

#! perl -slw use strict; use re 'eval'; sub Re_Stream { my( $re_user, $extend ) = @_; die "Usage: Re_Stream( regex, coderef )" unless defined $re_user and ref $extend eq 'CODE'; return qr[ (?: \Z (?(?{ $extend->() })|(?!) }) ) | $re_user ]x; } my $buf = 'abcdefghijklmnopqrstuvwxyz'; my $c = 'A'; sub extend{ $buf .= ($c++) x 100; return length $c < 2 } my $re_stream = Re_Stream( qr[(..)(...)], \&extend ); print $re_stream; my $i = 0; print "${ \++$i }: $1|$2" while $buf =~ m[$re_stream]g;

The sub Re_Stream(), takes a regex and a coderef. The regex can be any regex (in theory:), and the coderef should be a function that will extend the stream beyond it's current limit. This function should return true if it has extended the stream, and false if there is no more to come.

As coded, the while running the regex will continue to match against the stream until the extender function returns false. I'm not sure if this is progress. The upside is that you no longer have to inspect the user's regex in ordr to work out where to insert the code block to extend the buffer. In fatc you don't have to modify the user regex at all. However, there are a couple of problems with it as it stands.

  1. If the match crosses the boundary of the buffer being extended, a null match is returned.
  2. Ay attempt I made to shorten the pre-trucate the string, Ie. To discard some part of the front of the string that had already been processed seemed to "confuse" the regex.
  3. As is, it requires use re 'eval'; which may or may not be a problem.

I've only made a half-hearted attempt at fixing these so far, but thought that I would throw it open to see if anyone else can take it further, or dismiss it as unworkable.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!