Premise

In reply to a /msg by GrandFather, I'll point out that the above links are not "broken", but refer to USENET urls and not everybody may have a news client installed nor a system configured to launch it on such urls, thus for ease of use I'll give GG urls for these clpmisc posts:

Summary

The whole thread started at this post. To sum up the story, someone asked something about some Perl code he's seen, which included \d. So someone else explained that (to quote literally)

\d matches "0", "1" ... "8" or "9"

At this point yet another poster answered that

Last time I checked, \d matched 268 different characters. Dear programmer, if you mean 0-9, then write 0-9.

This spawned a sub discussion, because a fourth poster, and very well known contributor to the group pointed out that he while was aware that \w will match not only 'a'..'z', 'A'..'Z', '0'..'9', and '_', but possibly much more, depending on locale, it was not just obvious to him that \d will match anything but [0-9]. It was not obvious to me either, especially since I hardly know anything about this whole locales stuff, and that's why I'm reporting it here.

Further replies included two test/example scripts, which I'm pasting hereafter, unmodified.

The first script

The first script is by Dr.Ruud and collects some Unicode statistics:

#!/usr/bin/perl # Id: unicount.pl # Subject: show some Unicode statistics use warnings ; use strict ; use Data::Alias ; binmode STDOUT, ':utf8' ; my @table = # +--Name------+---qRegexp--------+-C-+-L-+-U-+ ( [ 'xdigit' , qr/[[:xdigit:]]/ , 0 , 0 , 0 ] , [ 'ascii' , qr/[[:ascii:]]/ , 0 , 0 , 0 ] , [ '\\d' , qr/\d/ , 0 , 0 , 0 ] , [ 'digit' , qr/[[:digit:]]/ , 0 , 0 , 0 ] , [ 'IsNumber' , qr/\p{IsNumber}/ , 0 , 0 , 0 ] , [ 'alpha' , qr/[[:alpha:]]/ , 0 , 0 , 0 ] , [ 'alnum' , qr/[[:alnum:]]/ , 0 , 0 , 0 ] , [ 'word' , qr/[[:word:]]/ , 0 , 0 , 0 ] , [ 'graph' , qr/[[:graph:]]/ , 0 , 0 , 0 ] , [ 'print' , qr/[[:print:]]/ , 0 , 0 , 0 ] , [ 'blank' , qr/[[:blank:]]/ , 0 , 0 , 0 ] , [ 'space' , qr/[[:space:]]/ , 0 , 0 , 0 ] , [ 'punct' , qr/[[:punct:]]/ , 0 , 0 , 0 ] , [ 'cntrl' , qr/[[:cntrl:]]/ , 0 , 0 , 0 ] , ) ; my @codepoints = ( 0x0000 .. 0xD7FF, 0xE000 .. 0xFDCF, 0xFDF0 .. 0xFFFD, 0x10000 .. 0x1FFFD, 0x20000 .. 0x2FFFD, # 0x30000 .. 0x3FFFD, # etc. ) ; for my $row ( @table ) { alias my ($name, $qrx, $count, $lower, $upper) = @$row ; printf "\n%s\n", $name ; my $n = 0 ; for ( @codepoints ) { local $_ = chr ; # int-2-char conversion $n++ ; if ( /$qrx/ ) { $count++ ; $lower++ if / [[:lower:]] /x ; $upper++ if / [[:upper:]] /x ; } } my $show_lower_upper = ($lower || $upper) ? sprintf( ' (lower:%6d, upper:%6d)' , $lower , $upper ) : '' ; printf "%6d /%6d =%7.3f%%%s\n" , $count , $n , 100 * $count / $n , $show_lower_upper } print "\n" ; __END__

Its output, as of the author's system (v5.8.6, i386-freebsd-64int) is:

xdigit 22 /194522 = 0.011% (lower: 6, upper: 6) ascii 128 /194522 = 0.066% (lower: 26, upper: 26) \d 268 /194522 = 0.138% digit 268 /194522 = 0.138% IsNumber 612 /194522 = 0.315% alpha 91183 /194522 = 46.875% (lower: 1380, upper: 1160) alnum 91451 /194522 = 47.013% (lower: 1380, upper: 1160) word 91801 /194522 = 47.193% (lower: 1380, upper: 1160) graph 102330 /194522 = 52.606% (lower: 1380, upper: 1160) print 102349 /194522 = 52.616% (lower: 1380, upper: 1160) blank 18 /194522 = 0.009% space 24 /194522 = 0.012% punct 374 /194522 = 0.192% cntrl 6473 /194522 = 3.328%

The second script

The second script is by Peter J. Holzer and supposed to directly show all the digits matched by \d, including those found in non-latin scripts:

#!/usr/bin/perl use warnings; use strict; use charnames qw(); for my $c (0x0000 .. 0xD7FF, 0xE000 .. 0xFDCF, 0xFDF0 .. 0xFFFD, 0x1_0000 .. 11_0000 ) { my $s = pack 'U', $c; if ($s =~ /\d/) { printf ("%5d %5x %s %s\n", $c, $c, $s, charnames::viacode($c)); } }

Its ouput is also pasted hereafter.

48 30 0 DIGIT ZERO 49 31 1 DIGIT ONE 50 32 2 DIGIT TWO 51 33 3 DIGIT THREE 52 34 4 DIGIT FOUR 53 35 5 DIGIT FIVE 54 36 6 DIGIT SIX 55 37 7 DIGIT SEVEN 56 38 8 DIGIT EIGHT 57 39 9 DIGIT NINE 1632 660 Ù ARABIC-INDIC DIGIT ZERO 1633 661 Ù¡ ARABIC-INDIC DIGIT ONE 1634 662 Ù¢ ARABIC-INDIC DIGIT TWO 1635 663 Ù£ ARABIC-INDIC DIGIT THREE 1636 664 Ù¤ ARABIC-INDIC DIGIT FOUR 1637 665 Ù¥ ARABIC-INDIC DIGIT FIVE 1638 666 Ù¦ ARABIC-INDIC DIGIT SIX 1639 667 Ù§ ARABIC-INDIC DIGIT SEVEN 1640 668 Ù¨ ARABIC-INDIC DIGIT EIGHT 1641 669 Ù© ARABIC-INDIC DIGIT NINE 1776 6f0 Û° EXTENDED ARABIC-INDIC DIGIT ZERO 1777 6f1 Û± EXTENDED ARABIC-INDIC DIGIT ONE 1778 6f2 Û² EXTENDED ARABIC-INDIC DIGIT TWO 1779 6f3 Û³ EXTENDED ARABIC-INDIC DIGIT THREE 1780 6f4 Û´ EXTENDED ARABIC-INDIC DIGIT FOUR 1781 6f5 Ûµ EXTENDED ARABIC-INDIC DIGIT FIVE 1782 6f6 Û¶ EXTENDED ARABIC-INDIC DIGIT SIX 1783 6f7 Û· EXTENDED ARABIC-INDIC DIGIT SEVEN 1784 6f8 Û¸ EXTENDED ARABIC-INDIC DIGIT EIGHT 1785 6f9 Û¹ EXTENDED ARABIC-INDIC DIGIT NINE 2406 966 ० DEVANAGARI DIGIT ZERO 2407 967 १ DEVANAGARI DIGIT ONE 2408 968 २ DEVANAGARI DIGIT TWO 2409 969 ३ DEVANAGARI DIGIT THREE 2410 96a ४ DEVANAGARI DIGIT FOUR 2411 96b ५ DEVANAGARI DIGIT FIVE 2412 96c ६ DEVANAGARI DIGIT SIX 2413 96d ७ DEVANAGARI DIGIT SEVEN 2414 96e ८ DEVANAGARI DIGIT EIGHT 2415 96f ९ DEVANAGARI DIGIT NINE 2534 9e6 ০ BENGALI DIGIT ZERO 2535 9e7 à§§ BENGALI DIGIT ONE 2536 9e8 ২ BENGALI DIGIT TWO 2537 9e9 à§© BENGALI DIGIT THREE 2538 9ea ৪ BENGALI DIGIT FOUR 2539 9eb à§« BENGALI DIGIT FIVE 2540 9ec ৬ BENGALI DIGIT SIX 2541 9ed à§­ BENGALI DIGIT SEVEN 2542 9ee à§® BENGALI DIGIT EIGHT 2543 9ef ৯ BENGALI DIGIT NINE 2662 a66 ੦ GURMUKHI DIGIT ZERO 2663 a67 à©§ GURMUKHI DIGIT ONE 2664 a68 ੨ GURMUKHI DIGIT TWO 2665 a69 à©© GURMUKHI DIGIT THREE 2666 a6a ੪ GURMUKHI DIGIT FOUR 2667 a6b à©« GURMUKHI DIGIT FIVE 2668 a6c ੬ GURMUKHI DIGIT SIX 2669 a6d à©­ GURMUKHI DIGIT SEVEN 2670 a6e à©® GURMUKHI DIGIT EIGHT 2671 a6f ੯ GURMUKHI DIGIT NINE 2790 ae6 ૦ GUJARATI DIGIT ZERO 2791 ae7 à«§ GUJARATI DIGIT ONE 2792 ae8 ૨ GUJARATI DIGIT TWO 2793 ae9 à«© GUJARATI DIGIT THREE 2794 aea ૪ GUJARATI DIGIT FOUR 2795 aeb à«« GUJARATI DIGIT FIVE 2796 aec ૬ GUJARATI DIGIT SIX 2797 aed à«­ GUJARATI DIGIT SEVEN 2798 aee à«® GUJARATI DIGIT EIGHT 2799 aef ૯ GUJARATI DIGIT NINE 2918 b66 à­¦ ORIYA DIGIT ZERO 2919 b67 à­§ ORIYA DIGIT ONE 2920 b68 à­¨ ORIYA DIGIT TWO 2921 b69 à­© ORIYA DIGIT THREE 2922 b6a à­ª ORIYA DIGIT FOUR 2923 b6b à­« ORIYA DIGIT FIVE 2924 b6c à­¬ ORIYA DIGIT SIX 2925 b6d à­­ ORIYA DIGIT SEVEN 2926 b6e à­® ORIYA DIGIT EIGHT 2927 b6f à­¯ ORIYA DIGIT NINE 3047 be7 ௧ TAMIL DIGIT ONE 3048 be8 ௨ TAMIL DIGIT TWO 3049 be9 ௩ TAMIL DIGIT THREE 3050 bea ௪ TAMIL DIGIT FOUR 3051 beb ௫ TAMIL DIGIT FIVE 3052 bec ௬ TAMIL DIGIT SIX 3053 bed ௭ TAMIL DIGIT SEVEN 3054 bee ௮ TAMIL DIGIT EIGHT 3055 bef ௯ TAMIL DIGIT NINE 3174 c66 ౦ TELUGU DIGIT ZERO 3175 c67 à±§ TELUGU DIGIT ONE 3176 c68 ౨ TELUGU DIGIT TWO 3177 c69 ౩ TELUGU DIGIT THREE 3178 c6a ౪ TELUGU DIGIT FOUR 3179 c6b ౫ TELUGU DIGIT FIVE 3180 c6c ౬ TELUGU DIGIT SIX 3181 c6d à±­ TELUGU DIGIT SEVEN 3182 c6e à±® TELUGU DIGIT EIGHT 3183 c6f ౯ TELUGU DIGIT NINE 3302 ce6 ೦ KANNADA DIGIT ZERO 3303 ce7 à³§ KANNADA DIGIT ONE 3304 ce8 ೨ KANNADA DIGIT TWO 3305 ce9 ೩ KANNADA DIGIT THREE 3306 cea ೪ KANNADA DIGIT FOUR 3307 ceb ೫ KANNADA DIGIT FIVE 3308 cec ೬ KANNADA DIGIT SIX 3309 ced à³­ KANNADA DIGIT SEVEN 3310 cee à³® KANNADA DIGIT EIGHT 3311 cef ೯ KANNADA DIGIT NINE 3430 d66 ൦ MALAYALAM DIGIT ZERO 3431 d67 ൧ MALAYALAM DIGIT ONE 3432 d68 ൨ MALAYALAM DIGIT TWO 3433 d69 ൩ MALAYALAM DIGIT THREE 3434 d6a ൪ MALAYALAM DIGIT FOUR 3435 d6b ൫ MALAYALAM DIGIT FIVE 3436 d6c ൬ MALAYALAM DIGIT SIX 3437 d6d ൭ MALAYALAM DIGIT SEVEN 3438 d6e ൮ MALAYALAM DIGIT EIGHT 3439 d6f ൯ MALAYALAM DIGIT NINE 3664 e50 ๹ THAI DIGIT ZERO 3665 e51 ๹‘ THAI DIGIT ONE 3666 e52 ๹’ THAI DIGIT TWO 3667 e53 ๹“ THAI DIGIT THREE 3668 e54 ๹” THAI DIGIT FOUR 3669 e55 ๹• THAI DIGIT FIVE 3670 e56 ๹– THAI DIGIT SIX 3671 e57 ๹— THAI DIGIT SEVEN 3672 e58 ๹˜ THAI DIGIT EIGHT 3673 e59 ๹™ THAI DIGIT NINE 3792 ed0 à»» LAO DIGIT ZERO 3793 ed1 à»»‘ LAO DIGIT ONE 3794 ed2 à»»’ LAO DIGIT TWO 3795 ed3 à»»“ LAO DIGIT THREE 3796 ed4 à»»” LAO DIGIT FOUR 3797 ed5 à»»• LAO DIGIT FIVE 3798 ed6 à»»– LAO DIGIT SIX 3799 ed7 à»»— LAO DIGIT SEVEN 3800 ed8 à»»˜ LAO DIGIT EIGHT 3801 ed9 à»»™ LAO DIGIT NINE 3872 f20 ༠TIBETAN DIGIT ZERO 3873 f21 ༡ TIBETAN DIGIT ONE 3874 f22 ༢ TIBETAN DIGIT TWO 3875 f23 ༣ TIBETAN DIGIT THREE 3876 f24 ༤ TIBETAN DIGIT FOUR 3877 f25 ༥ TIBETAN DIGIT FIVE 3878 f26 ༦ TIBETAN DIGIT SIX 3879 f27 ༧ TIBETAN DIGIT SEVEN 3880 f28 ༨ TIBETAN DIGIT EIGHT 3881 f29 ༩ TIBETAN DIGIT NINE 4160 1040 á၀ MYANMAR DIGIT ZERO 4161 1041 á၁ MYANMAR DIGIT ONE 4162 1042 á၂ MYANMAR DIGIT TWO 4163 1043 á၃ MYANMAR DIGIT THREE 4164 1044 á၄ MYANMAR DIGIT FOUR 4165 1045 á၅ MYANMAR DIGIT FIVE 4166 1046 á၆ MYANMAR DIGIT SIX 4167 1047 á၇ MYANMAR DIGIT SEVEN 4168 1048 á၈ MYANMAR DIGIT EIGHT 4169 1049 á၉ MYANMAR DIGIT NINE 4969 1369 á፩ ETHIOPIC DIGIT ONE 4970 136a á፪ ETHIOPIC DIGIT TWO 4971 136b á፫ ETHIOPIC DIGIT THREE 4972 136c á፬ ETHIOPIC DIGIT FOUR 4973 136d á፭ ETHIOPIC DIGIT FIVE 4974 136e á፮ ETHIOPIC DIGIT SIX 4975 136f á፯ ETHIOPIC DIGIT SEVEN 4976 1370 á፰ ETHIOPIC DIGIT EIGHT 4977 1371 á፱ ETHIOPIC DIGIT NINE 6112 17e0 á០KHMER DIGIT ZERO 6113 17e1 á១ KHMER DIGIT ONE 6114 17e2 á២ KHMER DIGIT TWO 6115 17e3 á៣ KHMER DIGIT THREE 6116 17e4 á៤ KHMER DIGIT FOUR 6117 17e5 á៥ KHMER DIGIT FIVE 6118 17e6 á៦ KHMER DIGIT SIX 6119 17e7 á៧ KHMER DIGIT SEVEN 6120 17e8 á៨ KHMER DIGIT EIGHT 6121 17e9 á៩ KHMER DIGIT NINE 6160 1810 á   MONGOLIAN DIGIT ZERO 6161 1811 á  ‘ MONGOLIAN DIGIT ONE 6162 1812 á  ’ MONGOLIAN DIGIT TWO 6163 1813 á  “ MONGOLIAN DIGIT THREE 6164 1814 á  ” MONGOLIAN DIGIT FOUR 6165 1815 á  • MONGOLIAN DIGIT FIVE 6166 1816 á  – MONGOLIAN DIGIT SIX 6167 1817 á  — MONGOLIAN DIGIT SEVEN 6168 1818 á  ˜ MONGOLIAN DIGIT EIGHT 6169 1819 á  ™ MONGOLIAN DIGIT NINE 6470 1946 ᥥ† LIMBU DIGIT ZERO 6471 1947 ᥥ‡ LIMBU DIGIT ONE 6472 1948 ᥥˆ LIMBU DIGIT TWO 6473 1949 ᥥ‰ LIMBU DIGIT THREE 6474 194a á¥¥Š LIMBU DIGIT FOUR 6475 194b ᥋ LIMBU DIGIT FIVE 6476 194c ᥥŒ LIMBU DIGIT SIX 6477 194d ᥥ LIMBU DIGIT SEVEN 6478 194e á¥¥Ž LIMBU DIGIT EIGHT 6479 194f ᥥ LIMBU DIGIT NINE 65296 ff10 ï¼¼ FULLWIDTH DIGIT ZERO 65297 ff11 ï¼¼‘ FULLWIDTH DIGIT ONE 65298 ff12 ï¼¼’ FULLWIDTH DIGIT TWO 65299 ff13 ï¼¼“ FULLWIDTH DIGIT THREE 65300 ff14 ï¼¼” FULLWIDTH DIGIT FOUR 65301 ff15 ï¼¼• FULLWIDTH DIGIT FIVE 65302 ff16 ï¼¼– FULLWIDTH DIGIT SIX 65303 ff17 ï¼¼— FULLWIDTH DIGIT SEVEN 65304 ff18 ï¼¼˜ FULLWIDTH DIGIT EIGHT 65305 ff19 ï¼¼™ FULLWIDTH DIGIT NINE 66720 104a0 ð𐒠OSMANYA DIGIT ZERO 66721 104a1 ð𐒡 OSMANYA DIGIT ONE 66722 104a2 ð𐒢 OSMANYA DIGIT TWO 66723 104a3 ð𐒣 OSMANYA DIGIT THREE 66724 104a4 ð𐒤 OSMANYA DIGIT FOUR 66725 104a5 ð𐒥 OSMANYA DIGIT FIVE 66726 104a6 ð𐒦 OSMANYA DIGIT SIX 66727 104a7 ð𐒧 OSMANYA DIGIT SEVEN 66728 104a8 ð𐒨 OSMANYA DIGIT EIGHT 66729 104a9 ð𐒩 OSMANYA DIGIT NINE

Conclusion

The author of the second script claims that the above comprises 218 entries, and it actually does. Discrepancies with the above "estimate" are due to different choices of codepoints search space. I actually tried on two systems of mine which are a Linux and WinXP one with v5.8.8 on both, and I get the same list of 270 on each of them. The script is a slightly modified version of the second script with the same search space of the first one: for added value by means of HTML rendering I used HTML::Entities and hereafter I'm not using <code> tags but <pre> ones for this reason. The script:

#!/usr/bin/perl -l use strict; use warnings; use HTML::Entities; my $i; for my $c (0x0000 .. 0xD7FF, 0xE000 .. 0xFDCF, 0xFDF0 .. 0xFFFD, 0x1_0000 .. 0x1_FFFD, 0x2_0000 .. 0x2_FFFD) { local $_=pack 'U', $c; next unless /\d/; print encode_entities sprintf '%3d: %#5x => %s', ++$i, $c => $_; } __END__

The output:

  1:  0x30 => 0
  2:  0x31 => 1
  3:  0x32 => 2
  4:  0x33 => 3
  5:  0x34 => 4
  6:  0x35 => 5
  7:  0x36 => 6
  8:  0x37 => 7
  9:  0x38 => 8
 10:  0x39 => 9
 11: 0x660 => ٠
 12: 0x661 => ١
 13: 0x662 => ٢
 14: 0x663 => ٣
 15: 0x664 => ٤
 16: 0x665 => ٥
 17: 0x666 => ٦
 18: 0x667 => ٧
 19: 0x668 => ٨
 20: 0x669 => ٩
 21: 0x6f0 => ۰
 22: 0x6f1 => ۱
 23: 0x6f2 => ۲
 24: 0x6f3 => ۳
 25: 0x6f4 => ۴
 26: 0x6f5 => ۵
 27: 0x6f6 => ۶
 28: 0x6f7 => ۷
 29: 0x6f8 => ۸
 30: 0x6f9 => ۹
 31: 0x966 => ०
 32: 0x967 => १
 33: 0x968 => २
 34: 0x969 => ३
 35: 0x96a => ४
 36: 0x96b => ५
 37: 0x96c => ६
 38: 0x96d => ७
 39: 0x96e => ८
 40: 0x96f => ९
 41: 0x9e6 => ০
 42: 0x9e7 => ১
 43: 0x9e8 => ২
 44: 0x9e9 => ৩
 45: 0x9ea => ৪
 46: 0x9eb => ৫
 47: 0x9ec => ৬
 48: 0x9ed => ৭
 49: 0x9ee => ৮
 50: 0x9ef => ৯
 51: 0xa66 => ੦
 52: 0xa67 => ੧
 53: 0xa68 => ੨
 54: 0xa69 => ੩
 55: 0xa6a => ੪
 56: 0xa6b => ੫
 57: 0xa6c => ੬
 58: 0xa6d => ੭
 59: 0xa6e => ੮
 60: 0xa6f => ੯
 61: 0xae6 => ૦
 62: 0xae7 => ૧
 63: 0xae8 => ૨
 64: 0xae9 => ૩
 65: 0xaea => ૪
 66: 0xaeb => ૫
 67: 0xaec => ૬
 68: 0xaed => ૭
 69: 0xaee => ૮
 70: 0xaef => ૯
 71: 0xb66 => ୦
 72: 0xb67 => ୧
 73: 0xb68 => ୨
 74: 0xb69 => ୩
 75: 0xb6a => ୪
 76: 0xb6b => ୫
 77: 0xb6c => ୬
 78: 0xb6d => ୭
 79: 0xb6e => ୮
 80: 0xb6f => ୯
 81: 0xbe6 => ௦
 82: 0xbe7 => ௧
 83: 0xbe8 => ௨
 84: 0xbe9 => ௩
 85: 0xbea => ௪
 86: 0xbeb => ௫
 87: 0xbec => ௬
 88: 0xbed => ௭
 89: 0xbee => ௮
 90: 0xbef => ௯
 91: 0xc66 => ౦
 92: 0xc67 => ౧
 93: 0xc68 => ౨
 94: 0xc69 => ౩
 95: 0xc6a => ౪
 96: 0xc6b => ౫
 97: 0xc6c => ౬
 98: 0xc6d => ౭
 99: 0xc6e => ౮
100: 0xc6f => ౯
101: 0xce6 => ೦
102: 0xce7 => ೧
103: 0xce8 => ೨
104: 0xce9 => ೩
105: 0xcea => ೪
106: 0xceb => ೫
107: 0xcec => ೬
108: 0xced => ೭
109: 0xcee => ೮
110: 0xcef => ೯
111: 0xd66 => ൦
112: 0xd67 => ൧
113: 0xd68 => ൨
114: 0xd69 => ൩
115: 0xd6a => ൪
116: 0xd6b => ൫
117: 0xd6c => ൬
118: 0xd6d => ൭
119: 0xd6e => ൮
120: 0xd6f => ൯
121: 0xe50 => ๐
122: 0xe51 => ๑
123: 0xe52 => ๒
124: 0xe53 => ๓
125: 0xe54 => ๔
126: 0xe55 => ๕
127: 0xe56 => ๖
128: 0xe57 => ๗
129: 0xe58 => ๘
130: 0xe59 => ๙
131: 0xed0 => ໐
132: 0xed1 => ໑
133: 0xed2 => ໒
134: 0xed3 => ໓
135: 0xed4 => ໔
136: 0xed5 => ໕
137: 0xed6 => ໖
138: 0xed7 => ໗
139: 0xed8 => ໘
140: 0xed9 => ໙
141: 0xf20 => ༠
142: 0xf21 => ༡
143: 0xf22 => ༢
144: 0xf23 => ༣
145: 0xf24 => ༤
146: 0xf25 => ༥
147: 0xf26 => ༦
148: 0xf27 => ༧
149: 0xf28 => ༨
150: 0xf29 => ༩
151: 0x1040 => ၀
152: 0x1041 => ၁
153: 0x1042 => ၂
154: 0x1043 => ၃
155: 0x1044 => ၄
156: 0x1045 => ၅
157: 0x1046 => ၆
158: 0x1047 => ၇
159: 0x1048 => ၈
160: 0x1049 => ၉
161: 0x17e0 => ០
162: 0x17e1 => ១
163: 0x17e2 => ២
164: 0x17e3 => ៣
165: 0x17e4 => ៤
166: 0x17e5 => ៥
167: 0x17e6 => ៦
168: 0x17e7 => ៧
169: 0x17e8 => ៨
170: 0x17e9 => ៩
171: 0x1810 => ᠐
172: 0x1811 => ᠑
173: 0x1812 => ᠒
174: 0x1813 => ᠓
175: 0x1814 => ᠔
176: 0x1815 => ᠕
177: 0x1816 => ᠖
178: 0x1817 => ᠗
179: 0x1818 => ᠘
180: 0x1819 => ᠙
181: 0x1946 => ᥆
182: 0x1947 => ᥇
183: 0x1948 => ᥈
184: 0x1949 => ᥉
185: 0x194a => ᥊
186: 0x194b => ᥋
187: 0x194c => ᥌
188: 0x194d => ᥍
189: 0x194e => ᥎
190: 0x194f => ᥏
191: 0x19d0 => ᧐
192: 0x19d1 => ᧑
193: 0x19d2 => ᧒
194: 0x19d3 => ᧓
195: 0x19d4 => ᧔
196: 0x19d5 => ᧕
197: 0x19d6 => ᧖
198: 0x19d7 => ᧗
199: 0x19d8 => ᧘
200: 0x19d9 => ᧙
201: 0xff10 => 0
202: 0xff11 => 1
203: 0xff12 => 2
204: 0xff13 => 3
205: 0xff14 => 4
206: 0xff15 => 5
207: 0xff16 => 6
208: 0xff17 => 7
209: 0xff18 => 8
210: 0xff19 => 9
211: 0x104a0 => 𐒠
212: 0x104a1 => 𐒡
213: 0x104a2 => 𐒢
214: 0x104a3 => 𐒣
215: 0x104a4 => 𐒤
216: 0x104a5 => 𐒥
217: 0x104a6 => 𐒦
218: 0x104a7 => 𐒧
219: 0x104a8 => 𐒨
220: 0x104a9 => 𐒩
221: 0x1d7ce => 𝟎
222: 0x1d7cf => 𝟏
223: 0x1d7d0 => 𝟐
224: 0x1d7d1 => 𝟑
225: 0x1d7d2 => 𝟒
226: 0x1d7d3 => 𝟓
227: 0x1d7d4 => 𝟔
228: 0x1d7d5 => 𝟕
229: 0x1d7d6 => 𝟖
230: 0x1d7d7 => 𝟗
231: 0x1d7d8 => 𝟘
232: 0x1d7d9 => 𝟙
233: 0x1d7da => 𝟚
234: 0x1d7db => 𝟛
235: 0x1d7dc => 𝟜
236: 0x1d7dd => 𝟝
237: 0x1d7de => 𝟞
238: 0x1d7df => 𝟟
239: 0x1d7e0 => 𝟠
240: 0x1d7e1 => 𝟡
241: 0x1d7e2 => 𝟢
242: 0x1d7e3 => 𝟣
243: 0x1d7e4 => 𝟤
244: 0x1d7e5 => 𝟥
245: 0x1d7e6 => 𝟦
246: 0x1d7e7 => 𝟧
247: 0x1d7e8 => 𝟨
248: 0x1d7e9 => 𝟩
249: 0x1d7ea => 𝟪
250: 0x1d7eb => 𝟫
251: 0x1d7ec => 𝟬
252: 0x1d7ed => 𝟭
253: 0x1d7ee => 𝟮
254: 0x1d7ef => 𝟯
255: 0x1d7f0 => 𝟰
256: 0x1d7f1 => 𝟱
257: 0x1d7f2 => 𝟲
258: 0x1d7f3 => 𝟳
259: 0x1d7f4 => 𝟴
260: 0x1d7f5 => 𝟵
261: 0x1d7f6 => 𝟶
262: 0x1d7f7 => 𝟷
263: 0x1d7f8 => 𝟸
264: 0x1d7f9 => 𝟹
265: 0x1d7fa => 𝟺
266: 0x1d7fb => 𝟻
267: 0x1d7fc => 𝟼
268: 0x1d7fd => 𝟽
269: 0x1d7fe => 𝟾
270: 0x1d7ff => 𝟿

Edit: g0n - readmore tags


In reply to Re^3: regex to detect any non digit and number by blazar
in thread regex to detect any non digit and number by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.