Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Another 64-bit Perl bug. Is it fixed post 5.18?

by BrowserUk (Patriarch)
on May 24, 2015 at 11:38 UTC ( [id://1127579]=perlquestion: print w/replies, xml ) Need Help??

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

The regex engine silently fails to process strings longer than 2**31 bytes on 64-bits perl's upto and including v5.18.4:

$x = "the quick brown fox\n"; $x x= 107374182; print length $x;; 2147483640 + ### 8 bytes less than 2^31. $n=0; ++$n while $x =~ m[^.*$]mg; print $n;; + ### finds all the lines. 107374182 $x .= "the straw that broke the camel's back\n"; print length $x;; + ### Add another line that pushes the length a few bytes over 2^ +31 2147483678 $n=0; ++$n while $x =~ m[^.*$]mg; print $n;; + ### and it silently fails to find any of them. 0

Before I raise a perlbug, does this fail on later perls? Does it fail on non-windows perls?

If its been fixed already, which version did the fix happen?

Thanks.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Replies are listed 'Best First'.
Re: Another 64-bit Perl bug. Is it fixed post 5.18?
by karlgoethebier (Abbot) on May 24, 2015 at 12:25 UTC

    I get this:

    v5.20.0 darwin 2147483640 107374182 2147483678 107374183 v5.18.2 darwin 2147483640 107374182 2147483678 0

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

      Thanks Karl. Look's like I'll be upgrading to 5.20.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
Re: Another 64-bit Perl bug. Is it fixed post 5.18?
by moritz (Cardinal) on May 24, 2015 at 12:48 UTC

    For as long as I remember doing Perl development, there was always a limitation that a quantifier like + or * wouldn't actually match an unlimited number of characters, but at most a fixed, upper limit.

    IIRC in the days of perl 5.8, it was more like 2**15.

    It seems the situation has improved a bit for + and *, but you can still see the explicit limit with the generic quantifier:

    $ perl -Mre=debug -e '/.{2,}/' Compiling REx ".{2,}" Final program: 1: CURLY {2,32767} (4) 3: REG_ANY (0) 4: END (0) minlen 2 Freeing REx: ".{2,}"

    So the upper limit for {2,} is actually 32767 (== 2**15 - 1), not unlimited.

    If its been fixed already, which version did the fix happen?

    I don't think it was ever fixed.

Re: Another 64-bit Perl bug. Is it fixed post 5.18?
by thanos1983 (Parson) on May 24, 2015 at 13:56 UTC

    Hello BrowserUk,

    I running on: This is perl 5, version 18, subversion 2 (v5.18.2) built for x86_64-linux-gnu-thread-multi and I can see two different observations based on your script.

    When I am executing your script with new line characters \n sample of the script bellow:

    #!/usr/bin/perl use strict; use warnings; use Benchmark ':hireswallclock'; # enable hires wallclock (microsecond +s) timing if possible my $iterations = 1; my $regexEngineCode = sub { my $x = "the quick brown fox\n"; $x x= 107374182; print length $x +. "\n"; ### 8 bytes less than 2^31. my $n=0; ++$n while $x =~ m[^.*$]mg; print $n . "\n"; ### finds al +l the lines. ### Add another line that pushes the length a few bytes over 2^ $x .= "the straw that broke the camel's back\n"; print length $x . + "\n"; $n=0; ++$n while $x =~ m[^.*$]mg; print $n . "\n"; ### and it sile +ntly fails to find any of them. }; my $time = timeit($iterations, $regexEngineCode); print "It took ", timestr($time), "\n";

    I am getting the following output:

    2147483641107374182 Out of memory! real 1m29.931s user 0m50.376s sys 0m4.824s

    I forgot to mention I am using also Benchmark and time(1) - Linux man page to see the process time.

    But when I remove the new line characters \n sample of code bellow:

    #!/usr/bin/perl use strict; use warnings; use Benchmark ':hireswallclock'; # enable hires wallclock (microsecond +s) timing if possible my $iterations = 1; my $regexEngineCode = sub { my $x = "the quick brown fox\n"; $x x= 107374182; print length $x; + ### 8 bytes less than 2^31. my $n=0; ++$n while $x =~ m[^.*$]mg; print $n; ### finds all the l +ines. ### Add another line that pushes the length a few bytes over 2^ $x .= "the straw that broke the camel's back\n"; print length $x; $n=0; ++$n while $x =~ m[^.*$]mg; print $n . "\n"; ### and it sile +ntly fails to find any of them. }; my $time = timeit($iterations, $regexEngineCode); print "It took ", timestr($time), "\n";

    I get on the output:

    214748364010737418221474836780 It took 32.7849 wallclock secs (31.79 usr + 1.02 sys = 32.81 CPU) @ +0.03/s (n=1) real 0m32.942s user 0m31.839s sys 0m1.129s

    So based on the observations/results provided by karlgoethebier above, shows that if you have appropriate HW, you can get a complete output on v5.18.2 even if with the new line characters. Instead of my HW failure. Although that on the 4th print statement he is getting a zero indicating that probably Perl v5.18.2 can not handle so big numbers.

    So in conclusion I would say a combination of HW and latest SW version of Perl can provide you with the result that you expect.

    Update: I am not expert (not even close) but I think this might be the reason. Short description about memory limitation between LinuxOS and WindowsOS Memory Limits in R.

    Update 2: I also found this regarding Perl 5.20.0 Better 64-bit support:

    On 64-bit platforms, the internal array functions now use 64-bit offsets, allowing Perl arrays to hold more than 2**31 elements, if you have the memory available. The regular expression engine now supports strings longer than 2**31 characters. perl #112790, #116907 The functions PerlIO_get_bufsiz, PerlIO_get_cnt, PerlIO_set_cnt and PerlIO_set_ptrcnt now have SSize_t, rather than int, return values and parameters..

    Update 3: update hyper-link and modified text output.

    Hope this helps.

    Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: Another 64-bit Perl bug. Is it fixed post 5.18?
by Laurent_R (Canon) on May 24, 2015 at 13:37 UTC
    It also fails on 5.14.4 for Cygwin:
    $n=0; ++$n while $x =~ m[^.*$]mg; print $n;; 0

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1127579]
Approved by moritz
Front-paged by snoopy
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2024-03-28 12:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found