Re: Worrying regex issue with 5.8.0
by chromatic (Archbishop) on Nov 14, 2002 at 20:36 UTC
|
| [reply] |
|
|
| [reply] |
|
|
Works fine here as well.. heres my info: Solaris 8 (420R)
[3:24pm] 11 [~]:msp-mainserver% perl -V
Summary of my perl5 (revision 5.0 version 8 subversion 0) configuratio
+n:
Platform:
osname=solaris, osvers=2.8, archname=sun4-solaris
uname='sunos solaris 5.8 generic_108528-11 sun4u sparc sunw,ultra-
+5_10 '
config_args='-Dcc=gcc -B/usr/ccs/bin/'
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef usemultipl
+icity=undef
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='gcc -B/usr/ccs/bin/', ccflags ='-fno-strict-aliasing -D_LARGEF
+ILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O',
cppflags='-fno-strict-aliasing'
ccversion='', gccversion='3.1', gccosandvers='solaris2.8'
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=1
+6
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
+ lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='gcc -B/usr/ccs/bin/', ldflags =' -L/usr/local/lib '
libpth=/usr/local/lib /usr/lib /usr/ccs/lib
libs=-lsocket -lnsl -lgdbm -ldl -lm -lc
perllibs=-lsocket -lnsl -ldl -lm -lc
libc=/lib/libc.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
cccdlflags='-fPIC', lddlflags='-G -L/usr/local/lib'
Characteristics of this binary (from libperl):
Compile-time options: USE_LARGE_FILES
Built under solaris
Compiled at Jul 22 2002 02:55:19
@INC:
/usr/local/lib/perl5/5.8.0/sun4-solaris
/usr/local/lib/perl5/5.8.0
/usr/local/lib/perl5/site_perl/5.8.0/sun4-solaris
/usr/local/lib/perl5/site_perl/5.8.0
/usr/local/lib/perl5/site_perl
-Waswas | [reply] [d/l] |
|
|
|
|
| [reply] |
Re: Worrying regex issue with 5.8.0
by graff (Chancellor) on Nov 15, 2002 at 03:38 UTC
|
This is some weird stuff, but I'm confused. Does the name
of the variable ($re_bad) imply that expression typed by
the programmer is bad, or are you saying that this should
be considered an okay expression, and perl just seems to be
doing something bad to it?
I wonder, because this part: [A-Z$#@_#] seems
to be specifying "#" twice within a character class, which
seems odd (should be harmless, granted, but odd). The "qr"
operator is being used with single-quote delimiters, so the
"@_" is being taken literally, rather than being interpolated
(was that your intention?).
When some other delimiter is used, the result is different:
with nothing else in the script, "@_" is empty, so it seems
to interpolate as an empty string in this case.
I tried both ways, on both my home linux and my office solaris 5.8,
and got results that were not identical, but equivalent, and
worrisome in all cases. Feeding Data::Dumper's
output through something like "od" helps to see the issue:
solaris5.8 $ test.perl | od -t ax1
0000000 $ V A R 1 sp = sp q r / ( ? x -
+i
24 56 41 52 31 20 3d 20 71 72 2f 28 3f 78 2d
+69
0000020 s m : @ @ ( [ A - Z $ # @ _ #
+]
73 6d 3a 40 40 28 5b 41 2d 5a 24 23 40 5f 23
+5d
0000040 * ) sp ( ? ! sp [ A - Z a - z 0
+-
2a 29 20 28 3f 21 20 5b 41 2d 5a 61 2d 7a 30
+2d
0000060 9 $ # @ _ ] sp ) nul nul lf ) / ; lf
39 24 23 40 5f 5d 20 29 00 00 0a 29 2f 3b 0a
Note the null bytes. Is it supposed do that?
I happened to get the exact same results on linux for this case.
Now, if I change the delimiters on "qr" to something else
(like slashes), which allows the "@_" to interpolate,
the two systems do seem to differ:
solaris5.8 $ test-slashes.perl | od -t ax1
0000000 $ V A R 1 sp = sp q r / ( ? x -
+i
24 56 41 52 31 20 3d 20 71 72 2f 28 3f 78 2d
+69
0000020 s m : @ @ ( [ A - Z # ] * ) sp
+(
73 6d 3a 40 40 28 5b 41 2d 5a 23 5d 2a 29 20
+28
0000040 ? ! sp [ A - Z a - z 0 - 9 $ #
+@
3f 21 20 5b 41 2d 5a 61 2d 7a 30 2d 39 24 23
+40
0000060 _ ] sp ) can lf ) / ; lf
5f 5d 20 29 18 0a 29 2f 3b 0a
#####
linux2.4 $ test-slashes.perl | od -t ax1
0000000 $ V A R 1 sp = sp q r / ( ? x -
+i
24 56 41 52 31 20 3d 20 71 72 2f 28 3f 78 2d
+69
0000020 s m : @ @ ( [ A - Z # ] * ) sp
+(
73 6d 3a 40 40 28 5b 41 2d 5a 23 5d 2a 29 20
+28
0000040 ? ! sp [ A - Z a - z 0 - 9 $ #
+@
3f 21 20 5b 41 2d 5a 61 2d 7a 30 2d 39 24 23
+40
0000060 _ ] sp ) s lf ) / ; lf
5f 5d 20 29 73 0a 29 2f 3b 0a
Where solaris had one garbage character (0x18), linux had
a different character ("s"), which on the surface looks plausible,
but is garbage nonetheless, I expect (e.g. it's whatever happens
to have been at some point in core when a C function happens
to step past the boundary of an array).
Also, it seems that only
the first instance of "@_" was interpolated -- if that makes sense, I need
to read perlre again, much more carefully... | [reply] [d/l] [select] |
|
|
I am glad that somebody managed to repeat the error. I think that people were beginning to think I was seeing things :-)
I want to try some more tests myself but, to answer your questions:
The variable name $re_bad is meant to imply that Perl is doing something bad with a valid regex. The regex is meaningless, that was just as small as I could get it. (As mentioned above, if I remove any further elements, I do not get the error.) The real regex is much longer. In fact, I got this error for a couple of regexes - one to match a SQL label and one to match a SQL global variable.
Yes, I intentionally used single quote delimeters. I have also observed the behaviour when using other delimeters. However, so far, I only get the problem when using /x...
Kevin
| [reply] [d/l] |
Re: Worrying regex issue with 5.8.0
by PodMaster (Abbot) on Nov 15, 2002 at 05:53 UTC
|
use Data::Dumper;
my $re_bad = qr'@@([A-Z$#@_#]*) (?! [A-Za-z0-9$#@_] )'x;
print Dumper $re_bad;
print "\n\n\n";
print $re_bad;
__END__
$VAR1 = qr/(?x-ism:@@([A-Z$#@_#]*) (?! [A-Za-z0-9$#@_] ))/;
(?x-ism:@@([A-Z$#@_#]*) (?! [A-Za-z0-9$#@_] ))
____________________________________________________ ** The Third rule of perl club is a statement of fact: pod is sexy. | [reply] [d/l] |
|
|
Well, from your output, you appear not to get any spurious characters anyway.
When I said above that Data::Dumper was not the issue, what I meant was that I was just using it to display the output. I originally observed the problem because the characters added to my regex were **, which caused the regex to be invalid (luckily, or I might not have spotted it for a while).
However, Data::Dumper may yet be involved. When I run the following code:
use Data::Dumper;
my $re_bad = qr'xx([A-Z$#@_!]*) (?! [A-Za-z0-9$#@_] )'x;
print $re_bad;
the output is:
(?x-ism:xx([A-Z$#@_!]*) (?! [A-Za-z0-9$#@_] )#
)
Note the spurious hash (or pound, if you prefer :-) at the end. Given the following code:
my $re_bad = qr'xx([A-Z$#@_!]*) (?! [A-Za-z0-9$#@_] )'x;
print $re_bad;
the output is:
(?x-ism:xx([A-Z$#@_!]*) (?! [A-Za-z0-9$#@_] )
)
However, this does not yet prove that Data::Dumper is the culprit. Given the sporadic nature of the symptoms, it may well be that simply using any module of the right "size" will produce the error.
Kevin | [reply] [d/l] [select] |
|
|
Ok, latest summary...
It appears that the error does not occur unless both Data::Dumper and /x are present. Since the error is so sporadic, it is hard to be sure.
NB Just avoiding the use of Data::Dumper (say by using Dumpvalue instead) is not an option, since many CPAN modules which I am using themselves use Data::Dumper.
I will hang on a bit longer, to see if anybody spots anything else, then submit a bug report.
Cheers, Kevin
| [reply] |
Re: Worrying regex issue with 5.8.0
by tommyw (Hermit) on Nov 15, 2002 at 10:31 UTC
|
This looks to me as though it's the same problem I tripped over:
perl -e '$d=qr /\#\#/x; print $d, "\n";'
produces:
(?x-ism:\#\#l
)
when I thought it ought to produce (?x-ism:\#\#)
I sent it in as a bug report, and it's been fixed:
The memory corruption bug should be corrected by change #17994 :
Change 17994 by rgs@rgs-home on 2002/10/10 20:19:27
Fix bug #17776 : memory corruption in qr/##/x
The actual bug is due to:
* So, if /x was used, we scan backwards from the
* end of the regex. If we find a '#' before we
* find a newline, we need to add a newline
* ourself. If we find a '\n' first (or if we
* don't find '#' or '\n'), we don't need to add
* anything. -jfriedl
So if you can reorganise to stop this occuring, you'll be alright.
--
Tommy
Too stupid to live.
Too stubborn to die.
| [reply] [d/l] [select] |
|
|
Thanks Tommy - saved me from submitting a duplicate bug report.
Ok, so Data::Dumper is not the issue. Is it your understanding that any regex which uses /x and which contains a hash (even in a character class) which is not followed by a newline will potentially corrupt memory? If so, that is pretty grim. The instruction would then be:
Before upgrading to 5.8.0, check every regex which uses /x. For each one which contains a # not followed by newline, add the text # FIXME - workaround for bug #17776->newline here
Cheers, Kevin
| [reply] [d/l] |
|
|
Yes, or take the spaces out, and then dump the /x modifier. If you insist on keeping the original nicely formatted with spaces, then just pump it through s/ //g before passing it to qr (which was my solution). Obviously, this needs a certain amount of care too.
--
Tommy
Too stupid to live.
Too stubborn to die.
| [reply] [d/l] [select] |
|
|