chantstophacking has asked for the wisdom of the Perl Monks concerning the following question:
I have no choice but to use a regex for this problem. I'm using someone else's content management tool and I don't have access to the code to make changes in how it works. (Well, I do have access to the code, but I'm not going to change it with a new version due out any time now.)
I'm trying to clip out a section of an HTML stream, and I'm allowed to supply a regex that will be used to select the portion I want to keep. I'm not getting access to the real match operator (m//) so I cannot supply any options, I have to use a straight-up regex.
To make it worse, I want to use no greediness at the beginning, and then use greediness after that. But I have to supply one intact expression to do the job.
So consider this content stream...
bad stuff bad stuff <!--begin node--> good stuff <!--end node--> <!--begin node--> good stuff <!--end node--> bad stuff bad stuff
My job is to discard the stuff before the first occurrence of "begin node" and claim everything up to the last occurrence of "end node" (where I do not really want the comment lines, but I can accept them if necessary).
The methodology allows me to phrase a regex, and use parentheses to mark the section I want to keep.
The problem is that if I use a ".*" at the beginning as in:
.*begin node...(.*)....end node
then the greediness grabs me only the last node in the sequence. And there is nothing characteristic just prior to the first "begin node" marker that allows me to anchor it further.
Just so you know, I tried omitting the initial ".*" and that simply resulted in grabbing the entire document for me.
There may not be a good answer for this, but I knew that if there is an answer at all, it would be mentioned to me here. (At least that's what I've been told :-))
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Must use regex, how to clip...
by bart (Canon) on Jan 08, 2003 at 10:45 UTC | |
by chantstophacking (Acolyte) on Jan 11, 2003 at 00:41 UTC | |
|
Re: Must use regex, how to clip...
by seattlejohn (Deacon) on Jan 08, 2003 at 05:59 UTC | |
|
Re: Must use regex, how to clip...
by graff (Chancellor) on Jan 08, 2003 at 06:27 UTC | |
by chantstophacking (Acolyte) on Jan 11, 2003 at 00:39 UTC |