Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
This is kind of continuation of questions I've been asking when bumping into unexpected regex' performance issues, the last one was 11155604, I think. This one is also observed with very fresh/latest strawberry-perl-5.40.0.1-64bit-PDL, so perhaps it's something new.
I'm trying to improve one of CPAN modules which deals with PDF, the string below simulates a classic cross-reference table, with number of entries and preceding file data roughly the same as in one of the PDF files I'm using for tests.
Method (1) is similar to the original. I tried the (4) first, with vague idea of not creating useless copies of data. However, this is when I noticed that, while other changes (not relevant here) where steady speed/memory gains, unexpectedly everything got very slow. So I concocted the SSCCE below to ask if perhaps this is a bug in Perl or not. Also, strangely, the results of (4) vary somewhat from run to run, sometimes as "fast" as 1.33 s.
(Now I think to use perhaps the (3) further, after checking if global anchor is maintained/used elsewhere by module. The question remains about bug in Perl, as accidental by-product of otherwise idle investigations)
use strict; use warnings; use feature 'say'; use Time::HiRes 'time'; say $^V; my $s = '*' x 5_000_000; $s .= "0123456789 01234 n \n" x 40_000; my $re = qr/ (\d{10}) \x{20} (\d{5}) \x{20} (\w) \s\s /x; my ( $xref, $t ); # (1) peel off entry by entry $xref = substr $s, 5_000_000; # from shorter string $t = time; for ( 0 .. 39_999 ) { my $entry = substr $xref, $_ * 20, 20; die unless $entry =~ / \A $re /x; # do something useful with captures } say time - $t; $xref = substr $s, 5_000_000; # (2) global match (shorter string) $t = time; for ( 0 .. 39_999 ) { die unless $xref =~ / \G $re /gx; } say time - $t; # (3) global match (original string), pos( $s ) = 5_000_000; # start from pos $t = time; for ( 0 .. 39_999 ) { die unless $s =~ / \G $re /gx; } say time - $t; $xref = \substr $s, 5_000_000; # (4) use reference to substr $t = time; for ( 0 .. 39_999 ) { die unless $$xref =~ / \G $re /gx; } say time - $t; __END__ v5.40.0 0.0973920822143555 0.04703688621521 0.0475959777832031 3.08383107185364
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Regex match is very slow against deref'd reference to substr/lvalue, is it normal?
by dave_the_m (Monsignor) on Aug 12, 2024 at 07:17 UTC | |
by Anonymous Monk on Aug 12, 2024 at 10:21 UTC | |
|
Re: Regex match is very slow against deref'd reference to substr/lvalue, is it normal?
by Discipulus (Canon) on Aug 12, 2024 at 10:11 UTC |