I'm not familiar with those types.
They are standard types for microsoft compilers: MS CRT Standard types.
STRLEN should remain whatever strlen returns. That is usually size_t, and it's accessed via Size_t.
The problem is, as you pointed out above, size_t, (actually defined as what sizeof() returns), is an unsigned type, and therefore cannot handle negative indexing.
And since (on 32-bit), it isn't possible to have strings longer than 2GB, to me it makes sense to avoid the need for casting between signed and unsigned, and all the noise that adds to the sources, by utilising the otherwise unused high-bit to accommodate both Perl's negative indexing, and general pointer math.
ptrdiff_t (long integer or __int64, depending on the target platform) Result of subtraction of two pointers.
Seems to be perfectly defined for this purpose.
POSIX (though not ANSI or ISO) also define an equivalent type ssize_t for similar reasons.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
Seems to be perfectly defined for this purpose.
STRLEN is used for variables whose value will passed to the last argument of memcpy and for those that will receive the result of strlen. I don't see why it would be more suitable to use a different type than the actual type the functions use.
The problem is, as you pointed out above, size_t, (actually defined as what sizeof() returns), is an unsigned type, and therefore cannot handle negative indexing.
It does not need to handle negative indexing. The position and length are normalised before being stored into STRLEN vars, and the IV vars in which they stored as they are being normalised can already handle negative numbers. Putting the position and length into signed STRLEN vars instead of (signed) IV vars is not going to help simplify pp_substr any.
There are three ways of simplifying pp_substr:
-
If the maximum string length is no bigger than IV_MAX on all platforms, the simplest solution is to add a range check at the top and treat the position and length as IV vars on out.
if (SvIOK_UV(pos_sv) && (UV)pos_iv > (UV)IV_MAX)
goto BOUND_FAIL;
if (SvIOK_UV(len_sv) && (UV)len_iv > (UV)IV_MAX)
len_iv = IV_MAX;
-
Dictate that the arguments to substr are limited to the range of IV instead of IV+UV. Tough luck if your system supports longer strings. The code would be identical to the code in the previous bullet.
-
Define a type that can hold the entire supported range of positions and lengths, reducing the supported range of positions and lengths on some platforms if necessary. Check the arguments against that type's maximum value, then use the that type for pos_iv and len_iv.
| [reply] [d/l] [select] |
I wasn't limiting my ambitions to simplifying just pp_substr, but ridding the sources of the 1302 size mismatches in 73 files under Win64. Many of these come about because values extracted from IVs are assigned to or mixed with STRLEN values.
While casts cost nothing in runtime performance, adding all the pre-cast checks to ensure nothing will be lost does. And in 99% of cases totally unnecessarily.
Update: By which I mean, if STRLEN has to be cast to size_t when calling memcpy or strlen, it costs nothing because a (positive) signed N-bit integer will always fit in an unsigned N-bit integer regardless and mean the same thing. So no pre-cast checks required.
The essence of good software engineering is not the code you write, but the code you avoid writing.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |