in reply to Re^2: (german) region code detection - request for thoughts
in thread (german) region code detection - request for thoughts

Why would you refuse a hash ?

There can't be so much regions as to not easily keep them in memory, in a simple hash like

my %prefixes = ( '04025' = [ 'Region1, Region2, Region3' ], ... );

And afterwards, a straightforward check like
my ($pref5, $pref4, $pref3, $pref2) = map { substr( $phone, 0, $_ ) } (5, 4, 3, 2); my $prefix_length = exists $prefixes{$pref5} ? 5 : exists $prefixes{$pref4} ? 4 : exists $prefixes{$pref3} ? 3 : exists $prefixes{$pref2} ? 2 : 0 ; my $formatted_phone = join( ' ', substr( $phone, 0, $prefix_length), substr( $phone, $prefix_length), );
should work rather very effectively. If you have thought about this already, why do you think it would be expensive/ineffective/inadequate ?

Krambambuli
---

Replies are listed 'Best First'.
Re^4: (german) region code detection - request for thoughts
by Skeeve (Parson) on Aug 20, 2008 at 11:13 UTC

    I like one regexp match more than several substring comparisons. And I didn't want a "huge" array in my code. Just one "simple" regex. My module for matching region codes and international country codes is 9K while the region codes alone are 32K.

    32K is not a huge size nowadays, but I'm dated back from the ages of the C64 ;-)

    Your solution is quite clever but would need some enhancements to provide for

    1. Different minimal lengths of region codes
    2. Different maximum lengths of region codes
    3. It sholud find those values on it's own

    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
    +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
      1. Different minimal lengths of region codes
      2. Different maximum lengths of region codes
      3. It sholud find those values on it's own
      Update But... it's all there already ?
      There are no string comparisons, and the order of look-ups assures that the longest existent key/prefix always win.


      Oh, it's not, I just misunderstood your points. But definitely not hard to add, if really needed:
      use List::Util; my $min_prefix_length = min keys %prefixes; my $max_prefix_length = max keys %prefixes; my $prefix_length = $max_prefix_length; while ( $prefix_length-- >= $min_prefix_length) { last if exists $prefixes{ substr( $phone, 0, $prefix_length) }; } # Error/inexistent prefix if $prefix_length < $min_prefix_length;
      etc.

      Krambambuli
      ---

        Don't get me wrong. I just wanted to point out what was still missing, compared to my approach. I didn't want you to program that for me.

        Nevertheless ++ for your effort!


        s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
        +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
Re^4: (german) region code detection - request for thoughts
by Skeeve (Parson) on Aug 20, 2008 at 16:20 UTC
    krambambuli wrote:
    should work rather very effectively.

    I was unsure about that and so I benchmarked.

    I used the full list of 5132 region codes. Have no fear! There are not 32K of region codes following, just the (about) 9K of my regular expression which I use to generate the region code list and also the test data.

    This is the result:
    Rate Skeeve krambambuli Skeeve 10.7/s -- -30% krambambuli 15.4/s 43% --

    So regular expressions seem to be very efficient. The code is 30% faster. ;-) It isn't. krambambuli is right.


    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
    +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
      This is the result:
      Rate Skeeve krambambuli Skeeve 10.7/s -- -30% krambambuli 15.4/s 43% --
      Read again... ;)

      The results are saying the opposite: your code executes approx. 10 times in a second, mine does 15 times.

      Krambambuli
      ---