G'day igoryonya,
"I think, I used to stumble upon a character class, or a property ... can't find any information about it now ... something like ... [[:quote:]] or \p{Quote} ..."
I suspect you were looking in one or both of perlrecharclass or perlrebackslash; however, neither [[:quote:]] nor \p{Quote} exist (as far as I can tell).
I see ++soonix has indicated \p{QMark} and \p{Quotation_Mark} in perluniprops.
On a number of occasions in the past, I have had to identify Unicode properties (which could be for spaces, combining characters, specific scripts, and so on). I've tried the table in the "perluniprops: Properties accessible through \p{} and \P{}" section but find it to be a hard slog; for instance, a case-insensitive search for "quote" does find some matches but not the \p{Quotation_Mark} or \p{QMark} that you actually wanted.
I have found the best tool to be the core module Unicode::UCD. This has many functions which can help you identify properties (as well as find a lot of other information). I've put together a script to showcase some of that module's functionality: I'd recommend at least skimming the documentation to get a feel for the other functions that are available.
Much of the code in the script I'd probably just do from the command line; however, it seemed easier to lump it all together for the purposes of the current post.
#!/usr/bin/env perl
use 5.030;
use warnings;
use utf8;
use open OUT => qw{:encoding(UTF-8) :std};
use Data::Dump;
use Unicode::UCD qw{
charprops_all charprop prop_aliases
};
say 'Find properties of interest for a quote:';
dd charprops_all(ord '"');
say '-' x 40;
say 'Find aliases for "Quotation_Mark":';
my @aliases = prop_aliases('Quotation_Mark');
say join "\n", @aliases;
say '-' x 40;
no warnings 'qw';
my @chars
= qw{" ' ` ~ X ‛ , ‟ ซ ป < >};
use warnings 'qw';
say 'Check UCD vs. regex properties:';
for my $prop (@aliases) {
say '=' x 40;
say "Property: $prop";
say '=' x 40;
for my $char (@chars) {
my $cp = sprintf 'U+%04x', ord $char;
say "Character: $char";
say "Code point: $cp";
my $qmark_prop = charprop($cp, $prop);
say "$prop: $qmark_prop";
my $re_prop
= $char =~ /^\p{$prop}$/
? 'Yes' : 'No';
say "Check regex: $re_prop";
say 'UCD & RE match: ',
$qmark_prop eq $re_prop
? 'Yes' : '!!! No !!!';
say '-' x 40;
}
}
[Aside: Yes, <code> tags are generally preferred but, when the code contains Unicode characters, <pre> tags stop those characters from being turned into entity references (e.g. ‛ in your OP). I've also bunched up the code somewhat as <pre> tags won't wrap the code around like you'd get with <code> tags.]
Notes:
Here's an extract of the output. The full output, which is rather long, is in the spoiler below.
Find properties of interest for a quote:
{
...
Quotation_Mark => "Yes",
...
}
----------------------------------------
Find aliases for "Quotation_Mark":
QMark
Quotation_Mark
----------------------------------------
Check UCD vs. regex properties:
========================================
Property: QMark
========================================
Character: "
Code point: U+0022
QMark: Yes
Check regex: Yes
UCD & RE match: Yes
----------------------------------------
...
----------------------------------------
Character: X
Code point: U+0058
QMark: No
Check regex: No
UCD & RE match: Yes
----------------------------------------
Character:
Code point: U+2018
QMark: Yes
Check regex: Yes
UCD & RE match: Yes
----------------------------------------
...
========================================
Property: Quotation_Mark
========================================
... same as QMark ...
Full output:
— Ken
In reply to Re: regex quotes character class
by kcott
in thread regex quotes character class
by igoryonya
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |