I have no choice but to use a regex for this problem. I'm using someone else's content management tool and I don't have access to the code to make changes in how it works. (Well, I do have access to the code, but I'm not going to change it with a new version due out any time now.)

I'm trying to clip out a section of an HTML stream, and I'm allowed to supply a regex that will be used to select the portion I want to keep. I'm not getting access to the real match operator (m//) so I cannot supply any options, I have to use a straight-up regex.

To make it worse, I want to use no greediness at the beginning, and then use greediness after that. But I have to supply one intact expression to do the job.

So consider this content stream...

bad stuff bad stuff <!--begin node--> good stuff <!--end node--> <!--begin node--> good stuff <!--end node--> bad stuff bad stuff

My job is to discard the stuff before the first occurrence of "begin node" and claim everything up to the last occurrence of "end node" (where I do not really want the comment lines, but I can accept them if necessary).

The methodology allows me to phrase a regex, and use parentheses to mark the section I want to keep.

The problem is that if I use a ".*" at the beginning as in:

 .*begin node...(.*)....end node

then the greediness grabs me only the last node in the sequence. And there is nothing characteristic just prior to the first "begin node" marker that allows me to anchor it further.

Just so you know, I tried omitting the initial ".*" and that simply resulted in grabbing the entire document for me.

There may not be a good answer for this, but I knew that if there is an answer at all, it would be mentioned to me here. (At least that's what I've been told :-))


In reply to Must use regex, how to clip... by chantstophacking

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.