in reply to Re: Strip non-numeric
in thread Strip non-numeric

Thanks all -- got it :-) I ended up using:
$scalar = "a1b2"; $scalar =~ s/\D//g; print $scalar;
Also, I'll take a look at the perl documents on the s/// tr/// differences. Thanks.

Replies are listed 'Best First'.
Re: Strip non-numeric
by Abigail-II (Bishop) on Jan 13, 2003 at 23:47 UTC
    You should consider using s/\D+//g, as that's often a lot faster than s/\D//g. Here's a benchmark:
    #!/usr/bin/perl use strict; use warnings; use Benchmark; my @sizes = (10, 25, 50, 100, 250, 500, 1000); my @chars = ('A' .. 'Z', 'a' .. 'z', 0 .. 9); our @d = map {join "" => map {$chars [rand @chars]} 1 .. $_} @sizes; map { Benchmark::cmpthese timethese (-2 => { "simple_$sizes[$_]" => '$_ = $::d[' . $_ . ']; s/\D//g;', "multiple_$sizes[$_]" => '$_ = $::d[' . $_ . ']; s/\D+//g;' }, 'none'); } 0 .. $#sizes __END__ Rate simple_10 multiple_10 simple_10 196495/s -- -15% multiple_10 231225/s 18% -- Rate simple_25 multiple_25 simple_25 89788/s -- -50% multiple_25 180650/s 101% -- Rate simple_50 multiple_50 simple_50 47507/s -- -64% multiple_50 130727/s 175% -- Rate simple_100 multiple_100 simple_100 23206/s -- -77% multiple_100 103096/s 344% -- Rate simple_250 multiple_250 simple_250 10488/s -- -71% multiple_250 36407/s 247% -- Rate simple_500 multiple_500 simple_500 5046/s -- -75% multiple_500 20382/s 304% -- Rate simple_1000 multiple_1000 simple_1000 2528/s -- -76% multiple_1000 10549/s 317% --

    Abigail

      In this case transliteration is really the most efficient solution though. Consider the results of adding         "xlit_$sizes[$_]"     => '$_ = $::d[' . $_ . ']; tr/0-9//cd;', to the benchmark:
      Rate simple_10 multiple_10 xlit_10 simple_10 86400/s -- -31% -70% multiple_10 124615/s 44% -- -57% xlit_10 292712/s 239% 135% -- Rate simple_25 multiple_25 xlit_25 simple_25 45324/s -- -49% -82% multiple_25 88062/s 94% -- -65% xlit_25 248802/s 449% 183% -- Rate simple_50 multiple_50 xlit_50 simple_50 23823/s -- -71% -89% multiple_50 82566/s 247% -- -62% xlit_50 218684/s 818% 165% -- Rate simple_100 multiple_100 xlit_100 simple_100 13397/s -- -69% -92% multiple_100 43191/s 222% -- -74% xlit_100 168434/s 1157% 290% -- Rate simple_250 multiple_250 xlit_250 simple_250 5608/s -- -71% -95% multiple_250 19639/s 250% -- -81% xlit_250 103656/s 1748% 428% -- Rate simple_500 multiple_500 xlit_500 simple_500 2832/s -- -72% -95% multiple_500 10189/s 260% -- -83% xlit_500 59072/s 1986% 480% -- Rate simple_1000 multiple_1000 xlit_1000 simple_1000 1380/s -- -77% -96% multiple_1000 5939/s 330% -- -83% xlit_1000 34457/s 2397% 480% --
      Esp in large data sets, transliteration screams.

      Makeshifts last the longest.

        True, but my point was the pattern of s/PAT//g, which would benefit to be written as s/PAT+//g. tr isn't as flexible - not even in this case. \D follow the locale and Unicode rules when appropriate, where as the tr has the digits hardcoded.

        Abigail

      Ok, thanks. Also did have one more question about this -- I didn't take into consideration about valid non-numeric (only decmial). So for example, 1a2b3.4c5d6e, I would want 123.456, not just 123456 -- tried a few combinations, but nothing yet. I'm missing something obvious I know ;) Any pointers?
        It always helps if you are specific. Noone enjoys a game of "How do I do X?", "This is how you do X", "But I don't really want to do X, I want to do Y".

        So, from 1a2b3.4c5d6e, you want 123.456. But what if you have 1a2b.3c4d.5e6f? What do you want then?

        Abigail

        If I understand you correctly, you want to remove floating point numbers as well, together with normal integer numbers but not if they are part of a normal word (like 'HAL1'), don't you?

        Then maybe you should have a look at Regexp::Common. Together with putting whitespace around the regex you should get it to work.

        -- Hofmator

      Basically, in any given string, I want only numeric and decimal values retained. So, basically:

      1a => 1
      1a.2b => 1.2
      1a.2b.3c => 1.2.3

      ^^ Just as those show, no matter how many decimals or numeric values, those are the only characters I want retained. Hope this clears it up just a bit.
        So, you want to delete anything that isn't a number or a dot? Just do:
        s/[^\d.]+//g;
        or
        tr/0-9.//cd;

        Abigail