in reply to Re: Regex keep matching the last possible match (but should get all)
in thread Regex keep matching the last possible match (but should get all)

... I will gladly delegate every RegEx to other workers ...

That which does not make your brain explode makes you smarter. I understand that your recent experience with the wily regex has been harrowing, but now is not the time to retreat. Rather, consolidate what you have gained and secure some new footholds.

One point you seem not to have grasped is the critical difference between greedy and "lazy" (as I like to call it) matching; e.g., between  .+ and  .+? quantification. Please re-read previous sections in this thread bearing on this.

Perhaps the most important insight to take away is that regexes, much as I love them, are not always the best solution to a problem. Maybe re-consider the advice initially offered by Corion about using a real HTML parser for HTML parsing.

And yes, please do register as a user.

Until next time...


Give a man a fish:  <%-(-(-(-<

Replies are listed 'Best First'.
Re^3: Regex keep matching the last possible match (but should get all)
by aaron_baugher (Curate) on May 18, 2015 at 20:28 UTC
    Maybe re-consider the advice initially offered by Corion about using a real HTML parser for HTML parsing.

    Definitely. In 20 years of writing Perl, I've written a lot of long, ugly regexes to pull data out of HTML files as a one-time, quick-and-dirty solution. But I wouldn't count on any of them to be reliable enough to use repeatedly or for automated purposes. For anything reliable, use a module that won't break the day someone changes <TD> to <td> or rearranges a couple of tags.

    If the assignment says to do it with a regex, then that's what you do. But in a real-life parsing task, there's usually a better way.

    Aaron B.
    Available for small or large Perl jobs and *nix system administration; see my home node.