I fixed those problems in substr for 5.12.0
Well done. But--this isn't critism, just a need for understanding--those fixes look very complex in places.
For example:
- else {
- pos += curlen;
- if (num_args < 3)
- rem = curlen;
- else if (len >= 0) {
- rem = pos+len;
- if (rem > (I32)curlen)
- rem = curlen;
becomes + if (pos1_is_uv || pos1_iv > 0) {
+ if ((UV)pos1_iv > curlen)
+ goto BOUND_FAIL;
+ }
+
+ if (num_args > 2) {
+ if (!len_is_uv && len_iv < 0) {
+ pos2_iv = curlen + len_iv;
+ if (curlen)
+ pos2_is_uv = curlen-1 > ~(UV)len_iv;
+ else
+ pos2_is_uv = 0;
+ } else { /* len_iv >= 0 */
+ if (!pos1_is_uv && pos1_iv < 0) {
+ pos2_iv = pos1_iv + len_iv;
+ pos2_is_uv = (UV)len_iv > (UV)IV_MAX;
+ } else {
+ if ((UV)len_iv > curlen-(UV)pos1_iv)
+ pos2_iv = curlen;
+ else
+ pos2_iv = pos1_iv+len_iv;
+ pos2_is_uv = 1;
+ }
Is all of that manual tracking whether a integer is signed or unsigned necessary?
If so, life just got a whole lot tougher for source-divers. That's really nasty!
but some remain elsewhere.
I'm currently looking at 1988 (1302 uniq), "possible loss of data" warnings spread across 73 source files in the 5.12.0 release.
Don't you just wish that C compilers would list the identifier of offending variables!
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
In some C files that I've written, I've suppressed certain warnings with:
#ifdef _MSC_VER
#pragma warning(disable:4700 4715 4716)
#endif
I forget which 3 warnings they equate to - and haven't checked. If it's just a matter of shutting 'em up, without being too particular about the means employed, I think there's a chance that approach could be useful.
With any luck you'd only need to do it in a few select header files (assuming that many of those 73 source files #include a common header).
Cheers, Rob | [reply] [d/l] |
Shutting up the warnings would be useful (for Perl) if I was convinced they were all false alarms, but I'm semi-convinced that one or more of them is the reason behind some traps I experience when I get close to allocating 4GB of virtual memory.
For Parrot, which effectively equates sizeof( INTVAL ) == sizeof( void* ) all over the show, I'm convinced that the configuration utility (that actually warns of the problem), is doing the wrong thing with respect to how it defines the fundemental typedefs.
Personally, I think that on windows the following typedefs for STRLEN, IV and UV should be used. And most (if not all) uses of int, I32, and U32 should dropped in favour of one if the 3 above.
#include <stdio.h>
#include <stddef.h>
typedef uintptr_t UV;
typedef ptrdiff_t STRLEN;
typedef intptr_t IV;
#ifdef _WIN64
#define IS_WIN64 ""
#else
#define IS_WIN64 "not"
#endif
void main( void ){
printf( "_WIN64 is %s defined\n", IS_WIN64 );
printf( "MAX: %d\n", _INTEGRAL_MAX_BITS );
printf( "size_t is %d bytes\n", sizeof( size_t ) );
printf( "int is %d bytes\n", sizeof( int ) );
printf( "long is %d bytes\n", sizeof( long ) );
printf( "long long is %d bytes\n", sizeof( long long ) );
printf( "UV is %d bytes\n", sizeof( UV ) );
printf( "IV is %d bytes\n", sizeof( IV ) );
printf( "STRLEN is %d bytes\n", sizeof( STRLEN ) );
}
Compiled for 32 & 64-bit targets this produces:
C:\test>size_t.exe
_WIN64 is not defined
MAX: 64
size_t is 4 bytes
int is 4 bytes
long is 4 bytes
long long is 8 bytes
UV is 4 bytes
IV is 4 bytes
STRLEN is 4 bytes
C:\test>size_t.exe
_WIN64 is defined
MAX: 64
size_t is 8 bytes
int is 4 bytes
long is 4 bytes
long long is 8 bytes
UV is 8 bytes
IV is 8 bytes
STRLEN is 8 bytes
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
Is all of that manual tracking whether a integer is signed or unsigned necessary?
The range of numbers substr accepts for its position and length offsets can be greater than the range of signed or unsigned integers.
Take a system where both IV and STRLEN are 32 bits. The range of the position and length arguments should be -2**32 .. 2**32-1 (33 bits). Even with all that complexity, only -2**31 .. 2**32 (32 bit signed or 32 bit unsigned) is accepted.
I'm all for simplifying it, but Perl doesn't currently have a type that's twice the size of IV (LONG_IV?) as far as I know. Keep in mind I wrote that under pressure since I took on the task when it was one of the last two or three 5.12 blockers. I took the easiest approach for me ("When all you have is a hammer, every problem looks like a nail.") and the one least likely to cause immediate problems.
Note that $[ is being removed shortly, so some substr will shrink (but not the bit you posted).
| [reply] [d/l] [select] |
Take a system where both IV and STRLEN are 32 bits. The range of the position and length arguments should be -2**32 .. 2**32-1 (33 bits). Even with all that complexity, only -2**31 .. 2**32 (32 bit signed or 32 bit unsigned) is accepted
Theoretically, but given that you'll never be able to allocate a single string >2**31 on a 32-bit machine, it seems to be overkill.
Looking at the evolution of substr over the past few years, it is little wonder that is has such tardy performance. Like many parts of the core, it looks ripe for refactoring, but guesss given the accumulated complexity, its now virtually untouchable.
But my main concern on seeing it, was the signed/unsigned handling was a general necessity, but I see that is not the case.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |