in reply to The Art of Hashing

brand new to hashing...i'm trying to make the script search for a partial email address

Hashing (as in perl's hashes) provides for fast lookup using exact matching only; and is entirely the wrong mechanism for partially matching anything.

If your keys are -- like your example "smith" -- actually *always* whole words, then you could index your data by whole surnames:

[0] Perl> @x = split( '[, ]', $_ ), push @{ $bySurname{ $x[0] }{ $x[1] + } }, [ @x[ 2, 3 ] ] for split /\n\s/, <<'END' Smith,John (248)-555-9430 jsmith@aol.com Hunter,Apryl (810)-555-3029 april@showers.org Stewart,Pat (405)-555-8710 pats@starfleet.co.uk Ching,Iris (305)-555-0919 iching@zen.org Doe,John (212)-555-0912 jdoe@morgue.com Jones,Tom (312)-555-3321 tj2342@aol.com Smith,John (607)-555-0023 smith@pocahontas.com Crosby,Dave (405)-555-1516 cros@csny.org Johns,Pam (313)-555-6790 pj@sleepy.com Jeter,Linda (810)-555-8761 netless@earthlink.net Garland,Judy (305)-555-1231 ozgal@rainbow.com END ;; [0] Perl> pp %bySurname;; ( "Jeter", { Linda => [["(810)-555-8761", "netless\@earthlink.net"]] }, "Ching", { Iris => [["(305)-555-0919", "iching\@zen.org"]] }, "Smith", { John => [ ["(248)-555-9430", "jsmith\@aol.com"], ["(607)-555-0023", "smith\@pocahontas.com"], ], }, "Crosby", { Dave => [["(405)-555-1516", "cros\@csny.org"]] }, "Jones", { Tom => [["(312)-555-3321", "tj2342\@aol.com"]] }, "Doe", { John => [["(212)-555-0912", "jdoe\@morgue.com"]] }, "Johns", { Pam => [["(313)-555-6790", "pj\@sleepy.com"]] }, "Hunter", { Apryl => [["(810)-555-3029", "april\@showers.org"]] }, "Garland", { Judy => [["(305)-555-1231", "ozgal\@rainbow.com\n"]] }, "Stewart", { Pat => [["(405)-555-8710", "pats\@starfleet.co.uk"]] }, )

Which would allow you to find all those with "smith" (provided you lc the keys, which I didn't above), but won't let you find those with "jo*" in the name.

For small numbers of lines -- a few thousands or so -- I'd keep them in a single string and using a simple text search.

For a fully wild-card search of many more than that, I'd probably build a 2 or 3 consecutive characters index.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.