in reply to Pattern matching and deriving the data between the "(double quotes) in HTML tag
G'day sp4rperl,
Welcome to the Monastery.
I see tybalt89 has provided a fix for your specific problem and Athanasius has provided an explanation of that fix along with some additional information.
As a general rule for matching between delimiters, consider simply finding the start delimiter and then matching everything which follows that isn't the end delimiter. So, your captures would look like ([^"]*). I find this:
Here's some quick examples showing same/different delimiter pairs matching some/no enclosed text:
$ perl -E 'my ($s, $e) = qw{" "}; q{a"b"c} =~ /$s([^$e]*)/; say "|$1|" +' |b| $ perl -E 'my ($s, $e) = qw{" "}; q{a""c} =~ /$s([^$e]*)/; say "|$1|"' || $ perl -E 'my ($s, $e) = qw{< >}; q{a<b>c} =~ /$s([^$e]*)/; say "|$1|" +' |b| $ perl -E 'my ($s, $e) = qw{< >}; q{a<>c} =~ /$s([^$e]*)/; say "|$1|"' ||
Here's a few more examples, with embedded newlines, showing:
$ perl -E 'my ($s, $e) = qw{" "}; qq{a"b\n"c} =~ /$s([^$e]*)/; say "|$ +1|"' |b | $ perl -E 'my ($s, $e) = qw{" "}; qq{a"b\n"c} =~ /$s(.*?)$e/; say "|$1 +|"' || $ perl -E 'my ($s, $e) = qw{" "}; qq{a"b\n"c} =~ /$s(.*?)$e/s; say "|$ +1|"' |b |
When dealing with data where the enclosed text may include an escaped delimiter (e.g. "abc\"xyz") neither the (.*?) nor the ([^"]*) will work (for that example, both will capture 'abc\'). In these cases, you'll need a somewhat more complex regular expression: see perlre: Quantifiers and search for 'the typical "match a double-quoted string" problem'. [Note: You won't have this issue with HTML.]
— Ken
|
|---|