in reply to Unicode::UCD=charprop and the speed of various keys
Devel::NYTProf can be used to profile your example code without any real modification other than to add -d:NYTProf to the command line.
Most of the time is spent in a subroutine called prop_invmap. These are the most expensive lines in that subroutine:
3970944 2.22s my ($hex_code_point, $name) = split "\t", $line; 3354 3355 # Weeds out all comments, blank lines, and named sequences 3356 3970944 5.69s 3970944 828ms next if $hex_code_point =~ /[^:xdigit:]/a; # spent 828ms making 3970944 calls to Unicode::UCD::CORE:match, avg 208ns/call 3357 3358 3914368 648ms my $code_point = hex $hex_code_point; 3359 3360 # The name of all controls is the default: the empty string. 3361 # The set of controls is immutable 3362 3914368 5.18s 3914368 475ms next if chr($code_point) =~ /[:cntrl:]/u; # spent 475ms making 3914368 calls to Unicode::UCD::CORE:match, avg 121ns/call 3363 3364 # If this is a name_alias, it isn't a name 3365 3894016 1.85s next if grep { $_ eq $name } @{$aliases{$code_point}}; 3366 3367 # If we are beyond where one of the special lines needs to 3368 # be inserted ... 3369 3854464 1.10s while ($i < @$algorithm_names 3370 && $code_point > $algorithm_names->$i->{'low'}) 3371 {
It might be worthwhile looking at mitigation options. If you are willing to throw memory at the problem, subclass or monkeypatch Unicode::UCD, and in your subclass use Memoize to memoize prop_invmap. The results are astounding:
real 0m0.275s user 0m0.263s sys 0m0.012s
Here's an inelegant example of monkeypatching prop_invmap in a module that otherwise simply exposes Unicode::UCD:
package MyUnicodeUCD; use strict; use warnings; use constant EXPORT_OK => [ qw( charinfo charblock charscript charblocks charscripts charinrange charprop charprops_all general_categories bidi_types compexcl casefold all_casefolds casespec namedseq num prop_aliases prop_value_aliases prop_values prop_invlist prop_invmap search_invlist MAX_CP ), ]; use Unicode::UCD @{EXPORT_OK()}; use Exporter; our @ISA=qw(Exporter); our @EXPORT_OK = @{EXPORT_OK()}; use Memoize; memoize 'prop_invmap'; *Unicode::UCD::prop_invmap = \&prop_invmap; 1;
Dave
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Unicode::UCD=charprop and the speed of various keys
by Anonymous Monk on Aug 27, 2018 at 04:07 UTC | |
by davido (Cardinal) on Aug 27, 2018 at 19:16 UTC |