Writing Perl feels like riding a vintage VW bus. Things don’t work the way you expect, but you can always feel the love. (Learned from article)
I have a program that parses big strings -- 30MB of data. It intensively uses "\G" to continue parsing from the poit it has matched previously. Normally that program runs 6 (six) seconds. But every few days it is running 4 hours (yes... four hours) and consumes all computation power on server.
My program reads $big_string from file, encloses that string in parentheses (creating a new string), then passes reference to that newly created string to function "list_extr" which does parsing and returns deserialized data structure.
Function "list_extr" gets reference to the big string it should parse. I have found that when called like this (interpolate, take reference):
list_extr(\"($big_string)")
list_extr( \( "(" . $big_string . ")" ) )
or like this (interpolate, save in new variable, take ref to variable):it is sometimes very, very slow.my $s = "($big_string)"; list_extr(\$s)
To solve the problem I have to pass that through spritnf:
list_extr(\sprintf("%s", $big_string))
This makes function "list_extr" work very fast (six seconds instead of a few hours).
My goal is to get $big_string, add parentheses at the begining and end of that string, pass reference of newly created string (enclosed in parentheses) to function list_extr. I hope that's clear.
I think the problem is with some string optimalizations in perl. When using string interpolated by perl it doesn't create new string, but somehow computes positions of parsing (pos $big_string, \G in regex) -- this takes a lot of computations. When using sprintf perl doesn't do optimalization, but creates new, plain, non interpolated, non combined, simple string. I think that optimalization is sometimes done, sometimes not -- this is why the problem occurs only once a few days. Below are parsing functions.
I have found that this solution sometimes is fast, sometimes slow:
and this solution is ALWAYS slow:list_extr(\( "(" . $big_string . ")" ));
my $ttt = "(" . $big_string . ")"; list_extr(\$ttt);
# \G(?:\s|#.*$)* -- means start from last position \G, # skip spaces and comments # till the # end of line # ([[:alpha:]](?:_?[[:alnum:]])*) -- my identifier # restrictions; start with letter, then # letters, underscores, digits; but # two underscores in a row not allowed, # underscore at the end not allowed sub list_extr { my ($a) = @_; ref $a eq 'SCALAR' or croak "wrong ref"; my @l; $$a =~ /\G(?:\s|#.*$)*\(/mgc or croak "parse err"; while ($$a =~ /\G(?:\s|#.*$)*([[:alpha:]](?:_?[[:alnum:]])*)(? +:\s|#.*$)*/mgc) { push @l, {'name' => $1, 'parm' => parm_extr($a)}; } $$a =~ /\G(?:\s|#.*$)*\)(?:\s|#.*$)*/mgc or croak "parse err"; return \@l; } sub parm_extr { my ($a) = @_; ref $a eq 'SCALAR' or croak "wrong ref"; my %p; $$a =~ /\G(?:\s|#.*$)*\(/mgc or croak "parse err"; while ($$a =~ /\G(?:\s|#.*$)*([[:alpha:]](?:_?[[:alnum:]])*)(? +:\s|#.*$)*/mgc) { my $n = $1; if ($$a =~ /\G([[:alpha:]](?:_?[[:alnum:]])*|"(?:[^\\" +[:cntrl:]]+|\\[\\"nt])*")/mgc) { $p{$n} = $1; } elsif ($$a =~ /\G(?=[-+.\d])/mgc) { $p{$n} = numb_extr($a); } elsif ($$a =~ /\G(?=\()/mgc) { $p{$n} = parm_extr($a); } else { croak "parse err"; } } $$a =~ /\G(?:\s|#.*$)*\)(?:\s|#.*$)*/mgc or croak "parse err"; return \%p; } sub numb_extr { my ($a) = @_; ref $a eq 'SCALAR' or croak "wrong ref"; $$a =~ /\G(?:\s|#.*$)*([-+]?\d*(\.\d*)?)/mgc or croak "parse e +rr"; my $n = $1; $n eq '0.0' and return 0; $n =~ /\A[-+](?!0.0\z)(?=[1-9]|0\.)\d+\.\d+(?<=[.\d][1-9]|\.0) +\z/ or croak "parse err"; length $n <= 15 + 2 or croak "numb too long"; $n = 0 + $n; # 1234567890.12345 abs $n > 99999999999999.9 and croak "numb out of range"; return 0 + $n; }
In reply to Hypothesis: some magic string optimalization in perl kills my server from time to time by leszekdubiel
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |