in reply to Re: (german) region code detection - request for thoughts
in thread (german) region code detection - request for thoughts

The tree is what I already have in my code.

The hash is something I didn't want.


s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e

Replies are listed 'Best First'.
Re^3: (german) region code detection - request for thoughts
by Krambambuli (Curate) on Aug 20, 2008 at 09:54 UTC
    Why would you refuse a hash ?

    There can't be so much regions as to not easily keep them in memory, in a simple hash like

    my %prefixes = ( '04025' = [ 'Region1, Region2, Region3' ], ... );

    And afterwards, a straightforward check like
    my ($pref5, $pref4, $pref3, $pref2) = map { substr( $phone, 0, $_ ) } (5, 4, 3, 2); my $prefix_length = exists $prefixes{$pref5} ? 5 : exists $prefixes{$pref4} ? 4 : exists $prefixes{$pref3} ? 3 : exists $prefixes{$pref2} ? 2 : 0 ; my $formatted_phone = join( ' ', substr( $phone, 0, $prefix_length), substr( $phone, $prefix_length), );
    should work rather very effectively. If you have thought about this already, why do you think it would be expensive/ineffective/inadequate ?

    Krambambuli
    ---

      I like one regexp match more than several substring comparisons. And I didn't want a "huge" array in my code. Just one "simple" regex. My module for matching region codes and international country codes is 9K while the region codes alone are 32K.

      32K is not a huge size nowadays, but I'm dated back from the ages of the C64 ;-)

      Your solution is quite clever but would need some enhancements to provide for

      1. Different minimal lengths of region codes
      2. Different maximum lengths of region codes
      3. It sholud find those values on it's own

      s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
      +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
        1. Different minimal lengths of region codes
        2. Different maximum lengths of region codes
        3. It sholud find those values on it's own
        Update But... it's all there already ?
        There are no string comparisons, and the order of look-ups assures that the longest existent key/prefix always win.


        Oh, it's not, I just misunderstood your points. But definitely not hard to add, if really needed:
        use List::Util; my $min_prefix_length = min keys %prefixes; my $max_prefix_length = max keys %prefixes; my $prefix_length = $max_prefix_length; while ( $prefix_length-- >= $min_prefix_length) { last if exists $prefixes{ substr( $phone, 0, $prefix_length) }; } # Error/inexistent prefix if $prefix_length < $min_prefix_length;
        etc.

        Krambambuli
        ---
      krambambuli wrote:
      should work rather very effectively.

      I was unsure about that and so I benchmarked.

      I used the full list of 5132 region codes. Have no fear! There are not 32K of region codes following, just the (about) 9K of my regular expression which I use to generate the region code list and also the test data.

      This is the result:
      Rate Skeeve krambambuli Skeeve 10.7/s -- -30% krambambuli 15.4/s 43% --

      So regular expressions seem to be very efficient. The code is 30% faster. ;-) It isn't. krambambuli is right.


      s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
      +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
        This is the result:
        Rate Skeeve krambambuli Skeeve 10.7/s -- -30% krambambuli 15.4/s 43% --
        Read again... ;)

        The results are saying the opposite: your code executes approx. 10 times in a second, mine does 15 times.

        Krambambuli
        ---