in reply to I match a pattern in regex, yet I don't get the group I wanted to extract for some reason

If I run:

use strict; use warnings; my $tagl = do {local $/; <DATA>}; if($tagl =~ /<div class=\"soda.*?>(.*?)<\/div>/ism) { my $grp = $1; print "'$grp'\n"; } __DATA__ <div class="soda odd"> Power. Grace. Wisdom. Wonder. </div> <div class="soda even"> Wonder. Power. Courage. </div> <div class="soda odd"> The future of justice begins with her </div>

it prints:

' Power. Grace. Wisdom. Wonder. '

Maybe what you are trying to match or the regex you are using isn't what you have posted?

Note that parsing HTML/XML using regexen is generally a really bad idea. You've been told this several times already. Just in case you've forgotten why, you might like a refresher with Why a regex *really* isn't good enough for HTML and XML, even for "simple" tasks

Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
  • Comment on Re: I match a pattern in regex, yet I don't get the group I wanted to extract for some reason
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: I match a pattern in regex, yet I don't get the group I wanted to extract for some reason
by jcb (Parson) on Jan 13, 2021 at 00:42 UTC
    Note that parsing HTML/XML using regexen is generally a really bad idea.

    The reason that it often works (for some definition of "works") is that few dynamic sites actually build and serialize a DOM tree, instead simply inserting details into (textual) templates. Regexen can match the parts of the output that come from the template, thereby selecting the insertions and extracting the desired information.

    The resulting parsers tend to be somewhat fragile, as any change to the template can invalidate the "islands" on which that the regex-based scraper relies, but can be suitable for tools that are needed quickly and for the short-term, or where inconveniences adapting the tool when the site changes are acceptable. The upside is that regex-based parsers are relatively easily written from inspecting the HTML page source without requiring knowledge of DOM structure and handling, giving them a lower "barrier of entry" for programmers unfamiliar with SGML/XML/DOM concepts.

Re^2: I match a pattern in regex, yet I don't get the group I wanted to extract for some reason
by SergioQ (Scribe) on Jan 12, 2021 at 16:44 UTC

    Maybe what you are trying to match or the regex you are using isn't what you have posted?

    I will try some of the solutions others posted, I just wanted to assure I definitely posted from the screen output. I'm the idiot who makes stupid mistakes, so this one I did multiple times, and was careful tp post the proper data. It was a real head scratcher for me. Especially since it works on RegEx101.com.

      In your original node you have posted an extract from the data, not the actual data. You also posted an extract from the code. There could be many things happening in the data you have not shown and/or the code you have not shown. Trim your data to a representative sample that still exhibits the problem for you, insert it into the smallest complete code and then post that here. See also SSCCE and How to ask better questions using Test::More and sample data. You may find in preparing these that you solve the problem - this is often a handy by-product of the exercise. :-)


      🦛