in reply to I match a pattern in regex, yet I don't get the group I wanted to extract for some reason

Do not use regular expressions to parse HTML or XML.

use warnings; use strict; use Mojo::DOM; my $dom = Mojo::DOM->new(<<'HTML'); <div id="taglines_content" class="header"> <div class="header"> <div class="nav"> <div class="desc">Showing all 3 taglines</div> </div> </div> <div class="soda odd">Power. Grace. Wisdom. Wonder.</div> <div class="soda even">Wonder. Power. Courage.</div> <div class="soda odd">The future of justice begins with her</div> </div> HTML $dom->find('.soda')->each(sub { print "$_\n" }); __END__ <div class="soda odd">Power. Grace. Wisdom. Wonder.</div> <div class="soda even">Wonder. Power. Courage.</div> <div class="soda odd">The future of justice begins with her</div>
  • Comment on Re: I match a pattern in regex, yet I don't get the group I wanted to extract for some reason
  • Download Code

Replies are listed 'Best First'.
Re^2: I match a pattern in regex, yet I don't get the group I wanted to extract for some reason
by SergioQ (Scribe) on Jan 12, 2021 at 17:47 UTC

    use Mojo::DOM

    That did it, and was much cleaner, thank you.

      Glad to hear it! I noticed that in your code you have my $tagl =  $resp->decoded_content;, so I assume you're using an HTTP client to get the HTML. Note that Mojolicious includes Mojo::UserAgent, which has direct integration with Mojo::DOM - I showed an example here.

        Forgive me, because I'm still trying to find the finesse of Mojo::DOM.

        Below is the first return I get from $dom->find('.ipl-zebra-list');

        But I want to narrow it to the table class that has "USA", which right below it has the class "release-date-tem"m which has the date.

        Certainly there must be some sort of "tree" functionality that I can use in Mojo::DOM to get to that last line. I promise you, I'm not just asking you to get the answer. At the same time I am scouring the internet for examples.

        Thanks for all your help so far.

        <table class="ipl-zebra-list ipl-zebra-list--fixed-first release-dates +-table-test-only"> <tr class="ipl-zebra-list__item release-date-item"> <td class="release-date-item__country-name"><a hre +f="/calendar/?region=au&amp;ref_=ttrel_rel_1">Australia </a></td> <td align="right" class="release-date-item__date">17 July 2020</td> + <td align="left" class="release-date-item__attri +butes"> (internet) </td> </tr> <tr class="ipl-zebra-list__item release-date-item"> <td class="release-date-item__country-name"><a hre +f="/calendar/?region=nz&amp;ref_=ttrel_rel_2">New Zealand </a></td> <td align="right" class="release-date-item__date">17 July 2020</td> + <td align="left" class="release-date-item__attri +butes"> (internet) </td> </tr> <tr class="ipl-zebra-list__item release-date-item"> <td class="release-date-item__country-name"><a hre +f="/calendar/?region=sg&amp;ref_=ttrel_rel_3">Singapore </a></td> <td align="right" class="release-date-item__date">10 December 2020</td +> <td align="left" class="release-date-item__a +ttributes"> (limited) </td> </tr> <tr class="ipl-zebra-list__item release-date-item"> <td class="release-date-item__country-name"><a hre +f="/calendar/?region=ca&amp;ref_=ttrel_rel_4">Canada </a></td> <td align="right" class="release-date-item__date">11 December 2020</td +> <td align="left" class="release-date-item__a +ttributes"> (internet) </td> </tr> <tr class="ipl-zebra-list__item release-date-item"> <td class="release-date-item__country-name"><a hre +f="/calendar/?region=us&amp;ref_=ttrel_rel_5">USA </a></td> <td align="right" class="release-date-item__date">11 December 2020</td +> <td class="release-date-item__attributes--em +pty"></td> </tr> <tr class="ipl-zebra-list__item release-date-item"> <td class="release-date-item__country-name"><a hre +f="/calendar/?region=nl&amp;ref_=ttrel_rel_6">Netherlands </a></td> <td align="right" class="release-date-item__date">7 January 2021</td> + <td class="release-date-item__attributes--empt +y"></td> </tr> </table>