walkingthecow has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys, I am trying to figure out the best way to search an array of information. Basically each element of the array is loaded in from a file like below:

open(NAMEDB, "/home/daamaya/database.csv"); @people=<NAMEDB>; close (NAMEDB);


The information, line-by-line resembles the following:
Josephine C Lowen,0000090978,ZZ40241

BUT, sometimes the information does not have middle initial, like the following:
Josephine Jen,00000123456,ZZ54321

Later in the script I open an /etc/passwd file that has been download from a server. The GECOS fields for users look a lot like the above example, except the group that they are in on the system is put in between, like so:


Josephine C Lowen,WHEEL,0000090978,ZZ40241

I use a subroutine I created and extract/cleanup the name, so I am left with Josephine C Lowen. I then use this name to search the database that is put into the array. If there is a Josephine C Lowen, then I want that match returned. If there is NOT a Josephine C Lowen, then I want to search/return a Josephine Lowen. If there is more than one, display them all. The user is then allowed to choose which is correct, and it should break down the information from the database and put the group in just like it is in the GECOS, then print to a file, like so:
Josephine C Lowen,GROUP,0000090978,ZZ40241

Everything there is grabbed from array except group.. Anyway, here is what I have, but it is not working since my exact match will also match David Won, and so on, and it just does not work like I need it to.
@gecos_split=split(/,/,$gecos); $new_gecos = &cleanGECOS($gecos_split[0]); $current_group=`grep ":$gid:" /users/oss/users/gro +up/$server_name.grp | cut -d : -f 1`; chomp($current_group); $current_group=uc($current_group); @names = $new_gecos; for $name (@names) { @comps = $name =~ m{(?:von|de la|de|van|der|le +|el|la).*|\w+}g; } @names = @comps; if ($names[0]) { @exact_match=grep{/^$new_gecos&/}@people; chomp(@exact_match); } if (!@exact_match) { if ($names[0] && $names[1] && $names[2]) { @approx_match=grep{/$names[0]/ && /$names[ +1]/ && /$names[2]/i}@people; chomp(@approx_match); } if (!@approx_match) { if ($names[2] eq "") { $names[2] = $names[1]; $names[1] = ""; } @approx_match=grep{/$names[0]/ && /$names[2]/i +}@people; chomp(@approx_match); } else { # print "Nothing in GECOS field\n"; } } if (@exact_match) { chomp($exact_match[0]); @exact_breakdown=split(/,/,$exact_match[0]); $gecos_new="$exact_breakdown[0],$current_group +,$exact_breakdown[1],$exact_breakdown[2]"; chomp($gecos_new); @exact_match = (); } elsif (@approx_match == 1) { chomp($approx_match[0]); @approx_breakdown=split(/,/,$approx_match[0]); $gecos_new="$approx_breakdown[0],$current_grou +p,$approx_breakdown[1],$approx_breakdown[2]"; chomp($gecos_new); @approx_match = (); } elsif (@approx_match) { for ($n=0; $n < @approx_match; $n++) { print "MATCH [$n] :: << @approx_match[$n] +>> \n"; } } else { print "NO-MATCH in database :: << $new_gecos >> + \n\n"; } }

Replies are listed 'Best First'.
Re: Best way to search content of an array
by GrandFather (Saint) on Sep 02, 2008 at 01:43 UTC

    First off: always use strictures (use strict; use warnings; - see The strictures, according to Seuss). There are a huge number of variables in you code fragment that may be global, but it is impossible to tell and impossible to tell what the nature of their contents ought to be if they are global. So, first off use strictures and lexical variables (declare them with my). If that doesn't resolve your issue by pointing out some silly lifetime issue, reduce the code to a runnable sample that demonstrates the problem.

    Note that you sample should provide any required data in a variable or in the __DATA__ section (and not much of it).

    @names = $new_gecos; looks suspect to me.

    Why do you need to chomp @exact_match?


    Perl reduces RSI - it saves typing
Re: Best way to search content of an array
by jethro (Monsignor) on Sep 02, 2008 at 09:35 UTC

    I don't think @names = $new_gecos; is doing what you want it to do. After that the @names array has only one entry, which makes the for loop after that a bit useless

    And the regex might not do what you want it to do even if @names had more than one entry:

    use Data::Dumper; @names=('de lowen','doe','de la mancha'); for $name (@names) { @comps = $name =~ m{(?:von|de la|de|van|der|le|el|la).*|\w+}g; } print Dumper(@comps);

    prints out

    $VAR1 = 'de la mancha';

    If you want to add values to an array in a loop, use push . If you have many values in a scalar variable and want to split them to an array, use split.

    And the most important thing: Test your code. Use Data::Dumper or simple print statements to show you what values are in your variables at different places in your program. You will be surprised how easy it is to find the bugs in your program.