in reply to Re: Strip non-numeric
in thread Strip non-numeric

In this case transliteration is really the most efficient solution though. Consider the results of adding         "xlit_$sizes[$_]"     => '$_ = $::d[' . $_ . ']; tr/0-9//cd;', to the benchmark:
Rate simple_10 multiple_10 xlit_10 simple_10 86400/s -- -31% -70% multiple_10 124615/s 44% -- -57% xlit_10 292712/s 239% 135% -- Rate simple_25 multiple_25 xlit_25 simple_25 45324/s -- -49% -82% multiple_25 88062/s 94% -- -65% xlit_25 248802/s 449% 183% -- Rate simple_50 multiple_50 xlit_50 simple_50 23823/s -- -71% -89% multiple_50 82566/s 247% -- -62% xlit_50 218684/s 818% 165% -- Rate simple_100 multiple_100 xlit_100 simple_100 13397/s -- -69% -92% multiple_100 43191/s 222% -- -74% xlit_100 168434/s 1157% 290% -- Rate simple_250 multiple_250 xlit_250 simple_250 5608/s -- -71% -95% multiple_250 19639/s 250% -- -81% xlit_250 103656/s 1748% 428% -- Rate simple_500 multiple_500 xlit_500 simple_500 2832/s -- -72% -95% multiple_500 10189/s 260% -- -83% xlit_500 59072/s 1986% 480% -- Rate simple_1000 multiple_1000 xlit_1000 simple_1000 1380/s -- -77% -96% multiple_1000 5939/s 330% -- -83% xlit_1000 34457/s 2397% 480% --
Esp in large data sets, transliteration screams.

Makeshifts last the longest.

Replies are listed 'Best First'.
Re: Strip non-numeric
by Abigail-II (Bishop) on Jan 14, 2003 at 10:14 UTC
    True, but my point was the pattern of s/PAT//g, which would benefit to be written as s/PAT+//g. tr isn't as flexible - not even in this case. \D follow the locale and Unicode rules when appropriate, where as the tr has the digits hardcoded.

    Abigail

      Ah, I didn't think of Unicode. I briefly considered the locale awareness, but couldn't think of any case where that would change the meaning of \d.

      Makeshifts last the longest.

        There could be a locale that considers ¼, ½ and ¾ to be numeric.

        Abigail