I should note that Dominus' solution suffers from exactly the bug that tsee is trying to work around, except worse because he didn't think carefully about what happens if $/ wants to do a greedy match across a block.
I think I'm misunderstanding something here then. I don't see any bug in Dominus' code.

To clarify, let me restate the problem: Given an input stream and a terminator (regex) pattern, break the stream into chunks, each ending in something the terminator matches. Naturally a problem arises when the terminator pattern matches (or contains) something potentially infinite, like qr/.*/. Then you get in memory problems, etc.

But where is a bug in that behaviour? The code is doing what it's supposed to do. How can it complete something that is impossible (provided the stream is infinite)? And I don't understand how you could work around that.

Furthermore Dominus' code should come with a caveat that it won't work entirely as expected if $/ contains a capturing set of parens.
Yes, it should, but his code is a textbook example which should not have to deal with all corner cases because the focus should stay on the problem at hand. In 'real' code I'd do some kind of inspection of the regex in order to catch that. And, btw, what sense is there in putting capturing parens into a terminator pattern?

-- Hofmator


In reply to Re3: Regexes on Streams - Revisited! by Hofmator
in thread Regexes on Streams - Revisited! by tsee

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.