Re: regec to select text ather than remove HTML tags

Not being too much of a “golfer,” I tend to solve such problems in two steps: first, I look for the string-structure that I am looking for, then I look for “hello...” within that string.

One issue that you should consider is that ... right now, you have no clearly-defined beginning/ending delimiter: where does the string begin, and where does it end? In such a case, the less-than/greater-than strings are the only reliable anchor-points that you have, in which case split() and pos() become your friends. (Along with the i,g modifiers of a regex.) You might be able to construct the argument (and therefore, a program) which says that what you really have here is a string that is “split by” either of these two characters. You iterate through the string, looking for these characters and noting their positions. You decide if a string-of-interest could be “beginning” or “ending,” and you extract the pieces for a closer look with substr().

Really, the true challenge of this kind of algorithm is “ruggedly and completely defining it.” It probably will be a two-part solution. (“First, find the strings, then, see if they’re interesting.”) After you have used perldoc and then maybe a few experimental programs to confirm in your own mind how these various Perl tools work, spend some serious thought-time defining your algorithm. It might not be entirely trivial. I would go so far as to recommend constructing a series of test-cases with test-strings, and build a Test::More test suite to actually and completely test it. You could easily construct a subtly flawed algorithm, bang it a few times, say, “yep, it seems to work,” and find that you are totally-wrong when your code goes into production. It happens. (A lot.) And, it’s not pretty or fun. The “extra” time needed to “prove it!!” will be worthwhile.