in reply to Re: I match a pattern in regex, yet I don't get the group I wanted to extract for some reason
in thread I match a pattern in regex, yet I don't get the group I wanted to extract for some reason

use Mojo::DOM

That did it, and was much cleaner, thank you.

  • Comment on Re^2: I match a pattern in regex, yet I don't get the group I wanted to extract for some reason

Replies are listed 'Best First'.
Re^3: I match a pattern in regex, yet I don't get the group I wanted to extract for some reason
by haukex (Archbishop) on Jan 12, 2021 at 18:08 UTC

    Glad to hear it! I noticed that in your code you have my $tagl =  $resp->decoded_content;, so I assume you're using an HTTP client to get the HTML. Note that Mojolicious includes Mojo::UserAgent, which has direct integration with Mojo::DOM - I showed an example here.

      Forgive me, because I'm still trying to find the finesse of Mojo::DOM.

      Below is the first return I get from $dom->find('.ipl-zebra-list');

      But I want to narrow it to the table class that has "USA", which right below it has the class "release-date-tem"m which has the date.

      Certainly there must be some sort of "tree" functionality that I can use in Mojo::DOM to get to that last line. I promise you, I'm not just asking you to get the answer. At the same time I am scouring the internet for examples.

      Thanks for all your help so far.

      <table class="ipl-zebra-list ipl-zebra-list--fixed-first release-dates +-table-test-only"> <tr class="ipl-zebra-list__item release-date-item"> <td class="release-date-item__country-name"><a hre +f="/calendar/?region=au&amp;ref_=ttrel_rel_1">Australia </a></td> <td align="right" class="release-date-item__date">17 July 2020</td> + <td align="left" class="release-date-item__attri +butes"> (internet) </td> </tr> <tr class="ipl-zebra-list__item release-date-item"> <td class="release-date-item__country-name"><a hre +f="/calendar/?region=nz&amp;ref_=ttrel_rel_2">New Zealand </a></td> <td align="right" class="release-date-item__date">17 July 2020</td> + <td align="left" class="release-date-item__attri +butes"> (internet) </td> </tr> <tr class="ipl-zebra-list__item release-date-item"> <td class="release-date-item__country-name"><a hre +f="/calendar/?region=sg&amp;ref_=ttrel_rel_3">Singapore </a></td> <td align="right" class="release-date-item__date">10 December 2020</td +> <td align="left" class="release-date-item__a +ttributes"> (limited) </td> </tr> <tr class="ipl-zebra-list__item release-date-item"> <td class="release-date-item__country-name"><a hre +f="/calendar/?region=ca&amp;ref_=ttrel_rel_4">Canada </a></td> <td align="right" class="release-date-item__date">11 December 2020</td +> <td align="left" class="release-date-item__a +ttributes"> (internet) </td> </tr> <tr class="ipl-zebra-list__item release-date-item"> <td class="release-date-item__country-name"><a hre +f="/calendar/?region=us&amp;ref_=ttrel_rel_5">USA </a></td> <td align="right" class="release-date-item__date">11 December 2020</td +> <td class="release-date-item__attributes--em +pty"></td> </tr> <tr class="ipl-zebra-list__item release-date-item"> <td class="release-date-item__country-name"><a hre +f="/calendar/?region=nl&amp;ref_=ttrel_rel_6">Netherlands </a></td> <td align="right" class="release-date-item__date">7 January 2021</td> + <td class="release-date-item__attributes--empt +y"></td> </tr> </table>

        See Mojo::DOM::CSS for the selectors supported by Mojo::DOM; these are based on CSS selectors, which you can learn more about at various places on the Internet, for example https://www.w3schools.com/css/css_selectors.asp (though Mojo only supports a subset). Unfortunately, checking the text content of elements isn't something they can do, but you can still use the other methods in Mojo::DOM and Mojo::Collection to get what you want. So just for example, given your example input, the following code prints "11 December 2020".

        $dom->find('.release-date-item') ->grep(sub { $_->at('.release-date-item__country-name') ->all_text =~ /^\s*USA\s*$/ }) ->each(sub { print $_->at('.release-date-item__date') ->all_text, "\n" });

        Update: To clarify: It's ok that I'm using a regular expression here because I'm matching against the return value of ->all_text, which is just getting the plain-text content of the tags; I'm not trying to parse the tags themselves with the regex - Mojo::DOM has done that for us.