comment on

I'm assuming you're referring to the regex in GrandFather's reply:
/data-src-hq="([^"]+)"/g

First, let me draw your attention to YAPE::Regex::Explain, which can explain regexes that do not have regex operators or features added after Perl verion 5.6:

c:\@Work\Perl\monks>perl
use strict;
use warnings;

use YAPE::Regex::Explain;

print YAPE::Regex::Explain->new(qr/data-src-hq="([^"]+)"/)->explain;

__END__
The regular expression:

(?-imsx:data-src-hq="([^"]+)")

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  data-src-hq="            'data-src-hq="'
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [^"]+                    any character except: '"' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
[download]

There are also on-line regex explainers.

Now let me address your narration.

img src=" is the first part to match

Ok.

[^"] match everything but a quote

I would word this as match a single character from the class of all characters except a " (double-quote). It's important to realize that the [...] regex operator defines a character class or set (see Character Classes and other Special Escapes in perlre and also this topic in perlretut, perlrequick and perlrecharclass), and that all by itself, any [...] matches only a single character.

+" stop when you hit a quote

I would quarrel with this description. The + quantifier (see Quantifiers in perlre; see also the topic of quantifiers in perlretut and perlrequick) is associated with the expression before it, i.e., [^"]+ and I would read it as match one or more characters from the class/set of all characters except a double-quote. Again, the double-quote is not directly associated with the + quantifier in your +" — but see below because they ~~are~~ | can be related.

() return only what matches within the brackets

Ok.

Am also curious what's the difference between +" and +?" since both seem to work

Again, note that the + or +? quantifiers affect the preceding [^"] character class, not the double-quote that follows. In the /data-src-hq="([^"]+)"/g match regex, the final " (double-quote) is not absolutely needed because [^"]+ will match as much as possible until it either hits a " or the end of the string. (I would still tend to use it because I like the feeling of security that well-defined boundaries give me. Also, a final " in the match will prevent a match with a "runaway" quote in a string in which the closing " is missing.) However, if you use a [^"]+? "lazy" or "non-greedy" expression instead, the final " becomes vital to matching the entire contents of the double-quoted substring. Try this:

c:\@Work\Perl\monks>perl
use strict;
use warnings;

my $s = 'foo "xyzzy" bar';

print qq{+? (lazy) quantifier with final ":    matched '$1' \n} if $s 
+=~ /"([^"]+?)"/;
print qq{+? (lazy) quantifier without final ": matched '$1' \n} if $s 
+=~ /"([^"]+?)/;

__END__
+? (lazy) quantifier with final ":    matched 'xyzzy'
+? (lazy) quantifier without final ": matched 'x'
[download]

A lazy quantifier matches the minimum necessary for an overall match. A final " in the regex is necessary in this case to capture the entire quoted substring. Take a look at this and be sure you understand what's going on, i.e., the difference between lazy and greedy matching.

Give a man a fish: <%-{-{-{-<

In reply to Re^3: Need help using regex to extract multiple matches by AnomalousMonk
in thread Need help using regex to extract multiple matches by SergioQ

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.