Re: Pattern Matching Question
by Fletch (Bishop) on Sep 10, 2003 at 14:43 UTC
|
| [reply] [d/l] |
Re: Pattern Matching Question
by Abigail-II (Bishop) on Sep 10, 2003 at 14:58 UTC
|
I thought that if a (^) occurs as the first character of a character class, the character class is negated.
Exactly. So, /[^"]/ matches any character that
isn't a double quote. /[^"]*/ matches zero or
more characters that are not double quotes, and
/"([^"]*)"/ matches a double quote (the starting
delimiter), zero or more characters that aren't a double
quote (the content), and then a double quote (the ending
delimiter). The parens capture the content.
Abigail
| [reply] [d/l] [select] |
Re: Pattern Matching Question
by antirice (Priest) on Sep 10, 2003 at 19:49 UTC
|
This is an excellent time to learn about a module called YAPE::Regex::Explain. With it, you can do the following:
#!/usr/bin/perl -w
use strict;
use YAPE::Regex::Explain;
my $regex_i_dont_understand = q~"([^"]*)"~;
print YAPE::Regex::Explain->new($regex_i_dont_understand)->explain;
__DATA__
output:
(?-imsx:"([^"]*)")
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
Pretty nifty, eh? Now whenever you don't understand the way a particular regular expression works, just set $regex_i_dont_understand to it and it will explain it piece by piece.
Hope this helps.
antirice The first rule of Perl club is - use Perl The ith rule of Perl club is - follow rule i - 1 for i > 1 | [reply] [d/l] |
Re: Pattern Matching Question
by sweetblood (Prior) on Sep 10, 2003 at 14:55 UTC
|
I looks like it is intended to take everything in between the double quotes by capturing what is NOT(^) a double quote. There could be problems with this approach though. For instance if there is a double quote inside the string that is not intended to be a closing quote such as the string "supplied on 5.25" disk". There is probably a better way to extract the string from between the double quotes i.e. /^"(.*)"$/ might do it if the entire string is wrapped in double quotes. | [reply] |
|
|
In English, I can see the string "supplied on 5.25" disk" as valid. But since I'm of a literal mind, the disk and closing quote aren't part of the string.
So I'd propose an example of what you are talking about as something like "supplied on 5.25\" disk" which is a valid Perl string...
Not that it really matters, just a slight nitpick.
| [reply] [d/l] [select] |
|
|
| [reply] |
Re: Pattern Matching Question
by zby (Vicar) on Sep 10, 2003 at 14:56 UTC
|
I thought that if a (^) occurs as the first character of a character class, the character class is negated.
And you were right. [^"]* matches a string of characters different than the double quote. And the whole pattern matches a string of characters different than double quotes inside double quotes.
| [reply] [d/l] |
Re: Pattern Matching Question
by Zaxo (Archbishop) on Sep 10, 2003 at 15:02 UTC
|
The regex expression [^"] is a character class, the caret meaning, as you say, negation - 'anything but what follows'. The * after that means matc zero or more of them, greedily. The parentheses capture what's matched in $1. The enclosing quotes are matched literally. The result is that everything between the first and second quote is captured.
Another way would be to use a non-greedy expression in the capture, /"(.*?)"/
After Compline, Zaxo
| [reply] [d/l] [select] |
|
|
Another way would be to use a non-greedy expression in the capture, /"(.*?)"/
Uhm, not quite. You'd have to use /"(.*?)"/s.
Furthermore, if you would embed the regex in a larger one,
"[^"*]" would never match a double quote inside the
other ones, while ".*?" may.
Abigail
| [reply] [d/l] [select] |
Re: Pattern Matching Question
by dsb (Chaplain) on Sep 10, 2003 at 17:51 UTC
|
The ^ inside the [] as the first character means anything that is NOT a double quote. The * outside the [] means to match as many non double quotes as possible.
Hello again, everyone. been awhile :)
Amel
This is my cool %SIG | [reply] |
Re: Pattern Matching Question
by Roger (Parson) on Sep 10, 2003 at 23:21 UTC
|
Just an alternative of double-quote matching - the following regular expression will match anything wrapped inside a double quote, including the escaped double-quotes.
$_ = '"Hello \" world!" I am Roger';
($str) = /("(?:\\"|.)*?")/x;
print "$str\n";
Note the use of ?: tells perl not to remember the inner pattern, which makes it a bit more efficient.
| [reply] [d/l] |
Re: Pattern Matching Question
by idsfa (Vicar) on Sep 11, 2003 at 06:03 UTC
|
| [reply] |
Re: Pattern Matching Question
by bl0rf (Pilgrim) on Sep 11, 2003 at 01:01 UTC
|
| [reply] |