Hi Monks,
I've run into some strange logic, which I'm suspecting is a bug within the regex engine of later Perl versions.
Using
libdevel-leak-perl with this simplified code sample:
use strict;
use warnings;
use Devel::Leak;
my $string = "TESTING STRING";
my $count = Devel::Leak::NoteSV (my $handle);
print ($string =~ s/\ STRING//);
undef $string;
Devel::Leak::CheckSV ($handle);
Testing this using Perl 26 / 30 the code snippet leaks showing:
new 0x55fd7b6db1e0 : SV = PV(0x55fd7b6dbe60) at 0x55fd7b6db1e0
REFCNT = 1
FLAGS = (POK,IsCOW,pPOK)
PV = 0x55fd7b713c00 "TESTING STRING"\0
CUR = 14
LEN = 16
COW_REFCNT = 1
Testing with Perl 16, this does not leak.
It looks like a copy of the input string is being made and stored in a SV but not being cleared up after the regex completes its replace?
My questions are, is this actually a bug within Perl? Am I doing something fundamentally wrong? If it is a fundamental bug are there any work arounds that could avoid leaking?
Striker.
Edit: Adding Perl version information.
Tested and confirmed leaking on:
Ubuntu 16.04
This is perl 5, version 22, subversion 1 (v5.22.1) built for x86_64-linux-gnu-thread-multi
Ubuntu 18.04
This is perl 5, version 26, subversion 1 (v5.26.1) built for x86_64-linux-gnu-thread-multi
Ubuntu 20.04
This is perl 5, version 30, subversion 0 (v5.30.0) built for x86_64-linux-gnu-thread-multi
Tested and not leaking on:
Centos 7
This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi
OEL 7
This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi
Additional Edit: Adding full instructions for reproduction on WSL.
Seems like this hasn't been so easy to reproduce elsewhere, although I've seen it on multiple systems. After getting home I spun up a brand new Ubuntu 18.04 Windows Subsystem for Linux version 2, I was able to reproduce it with these steps:
Run an
sudo apt update
Install
sudo apt install libdevel-leak-perl
Create the file
regex_test.pl with the contents:
use strict;
use warnings;
use Devel::Leak;
my $string = "TESTING STRING";
my $count = Devel::Leak::NoteSV (my $handle);
print ($string =~ s/\ STRING//);
undef $string;
Devel::Leak::CheckSV ($handle);
Run the test with
perl regex_test.pl
Which results in:
new 0x561fa5f971e0 : SV = PV(0x561fa5f97e60) at 0x561fa5f971e0
REFCNT = 1
FLAGS = (POK,IsCOW,pPOK)
PV = 0x561fa5fc6320 "TESTING STRING"\0
CUR = 14
LEN = 16
COW_REFCNT = 1
I'm not sure why it seems to be a bit flaky to reproduce, perhaps slight library / distribution differences that cause this to manifest. But thanks to everyone that's had a look at this so far!
I'll investigate some more myself and if I do find a solution I'll make sure to add it to the end of this post.
Another, Another Update:
After doing some more testing I saw the posts indicating that the CPAN version behaved differently to the APT version. I believe this is the case although they seem to have the same version number?
If I use the CPAN version I can see the behaviour that the others are seeing where It simply prints:
1
new 000000000072a728 :
To me this indicates that the CPAN module is "broken" it indicates a leak as noted by the "new 000000000072a728 :" section, but does not print the "debug" information associated with the SV that is being kept. If I switch from the CPAN module back to the APT version I get the full trace of information associated with the SV as previously noted. Thoughts?
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.