SteveS832001 has asked for the wisdom of the Perl Monks concerning the following question:

Here is my Problem I get a list of a couple million phone
numbers from the State Dept. in text files that contain
Phone Numbers of people that are on the do not call list.
I have created a program that will search through the list
of numbers to see if it exists. Which took forever. So I
Created a database and used perl to import that data into tables,
and then modified the program to search the databse
The search is still taking awhile to run
The text files I get are broken down into area codes, One
contains all 573 area code one has 314 and one has 636, I
also Have a file that has cell phone numbers which is not
broken down into any order.
So i figured I would break the tables down further than area code.
The data that I get is in this format 5731231234 how do I
break this number up into 3 parts 573 123 1234

Replies are listed 'Best First'.
Re: How to Breakup Numbers
by apl (Monsignor) on Mar 04, 2008 at 16:27 UTC
    my @phone_arr = unpack( 'A3A3A4', $phone_string );

    $phone_arr[0] will contain the area-code, etc.

Re: How to Breakup Numbers
by kyle (Abbot) on Mar 04, 2008 at 16:27 UTC
    use Data::Dumper; my $pn = '5731231234'; my @parts = ( $pn =~ m{ (\d{3}) (\d{3}) (\d{4}) }xms ); print Dumper \@parts; __END__ $VAR1 = [ '573', '123', '1234' ];

    I'm surprised your database is slow. Maybe it needs an index? For something as simple as this, a DBM::Deep database would be sufficient and probably pretty fast as well.

Re: How to Breakup Numbers
by dsheroh (Monsignor) on Mar 04, 2008 at 16:38 UTC
    $phone_number =~ /(\d{3})(\d{3})(\d{4})/ is the first thing that comes to mind, and will put the three parts into $1, $2, and $3.

    But your database really should be doing this reasonably quickly even without splitting the numbers up, provided the table is indexed on the column that you're searching. If it's not, then I expect that adding an index will help immensely - probably much more than changing from one field to three would.

    If you have the memory to burn on it, another option would be to load the list of numbers into a hash and then do hash lookups for the numbers you want to check. If you're doing large numbers of lookups in a single run of the program, this would probably be faster than the database, since it wouldn't need to go to disk for each lookup. But it requires a lot of memory and has the startup overhead of reading in the list(s) of numbers, so it will be worse than a database on small batches of lookups where there aren't as many searches to spread the startup cost across.

    If both the do not call list and the list of numbers you're checking are already sorted, you can optimize it heavily by opening the two files concurrently and comparing the first line in each, then going to the next line in whichever file had the lower number (or, if they match, it's a match, so flag it and advance in both files). This has the strong advantage of only needing to read each line of each file once and avoiding having to do any actual searching, so it should be faster than any other non-database method and will probably be faster than using a database as well unless your dnc list is orders of magnitude larger than the list of numbers to look up (and even then, this may still be faster). The primary drawback is that it requires the input files to be sorted, which may impose a heavy startup time if that isn't already the case.

      I tried the hash but there is just so much data it takes it 5 min to load everything to search
        How many numbers do you have and how many do you search for? I tried on 2.000.000 and it was all read in about 10 seconds. Searching 1000 numbers took less than 1s (Pentium D CPU 3GHz).
Re: How to Breakup Numbers
by olus (Curate) on Mar 04, 2008 at 16:28 UTC

    One possible solution is a regexp like the following

    my $number = 5731231234; my @parts = $number =~ /(.{3})(.{3})(.*)/;