in reply to Regex keep matching the last possible match (but should get all)

Dear Perl-Monks

Thanks a lot for all the help you provided. I finally got it working. For the next weeks I will gladly delegate every RegEx to other workers here, thats sure.

if you wonder how it looks at the end... without the fine art of usung other delimiters, still.

/'extinfo\.cgi[^>]+>([^<]+)<\/A>.{150,190}'status[^>]+>([^<]+)<\/TD><[ +^>]+>([^<]+)<\/TD><[^>]+>([^<]+s)<\/TD><[^>]+>(\d\/\d)<\/TD><[^>]+>([ +^<]+)<\/TD>/g

I will smooth out this tomorrow, but at least it yield what I was hoping: about 50 Arrays containing the pieces of Information I was looking for.

Thanks again, and until next time :)

  • Comment on Re: Regex keep matching the last possible match (but should get all)
  • Download Code

Replies are listed 'Best First'.
Re^2: Regex keep matching the last possible match (but should get all)
by AnomalousMonk (Archbishop) on May 18, 2015 at 16:55 UTC
    ... I will gladly delegate every RegEx to other workers ...

    That which does not make your brain explode makes you smarter. I understand that your recent experience with the wily regex has been harrowing, but now is not the time to retreat. Rather, consolidate what you have gained and secure some new footholds.

    One point you seem not to have grasped is the critical difference between greedy and "lazy" (as I like to call it) matching; e.g., between  .+ and  .+? quantification. Please re-read previous sections in this thread bearing on this.

    Perhaps the most important insight to take away is that regexes, much as I love them, are not always the best solution to a problem. Maybe re-consider the advice initially offered by Corion about using a real HTML parser for HTML parsing.

    And yes, please do register as a user.

    Until next time...


    Give a man a fish:  <%-(-(-(-<

      Maybe re-consider the advice initially offered by Corion about using a real HTML parser for HTML parsing.

      Definitely. In 20 years of writing Perl, I've written a lot of long, ugly regexes to pull data out of HTML files as a one-time, quick-and-dirty solution. But I wouldn't count on any of them to be reliable enough to use repeatedly or for automated purposes. For anything reliable, use a module that won't break the day someone changes <TD> to <td> or rearranges a couple of tags.

      If the assignment says to do it with a regex, then that's what you do. But in a real-life parsing task, there's usually a better way.

      Aaron B.
      Available for small or large Perl jobs and *nix system administration; see my home node.