htmanning has asked for the wisdom of the Perl Monks concerning the following question:

Monks, I'm trying to create a set of whitelist words for the Organization field in a sign up form. I have a field in the database with acceptable words on each line, then I do this:
$found="0"; @orgs = split /\n/,$good_orgs; foreach $good_orgs (@orgs) { if ($organization =~ /$good_orgs/i) { $found = "1"; } } if ($found eq "0") { &error_suspect_org; }
This triggers the error on every sign up. I'm missing something.

I should also mention that it is likely an organization will have 2 words from the list. (i.e. Company, LLC).

Replies are listed 'Best First'.
Re: Creating a whitelist
by haukex (Archbishop) on Jun 15, 2016 at 21:19 UTC

    Hi htmanning,

    This triggers the error on every sign up.

    I can't reproduce that. If I set $good_orgs="abc\ndef"; and $organization="abc"; then the error is not triggered (Update: same goes for $good_orgs="First\nNational\nBank"; $organization="National Bank"; from your node here - do your variables contain what you think they contain? For example, is $good_orgs a list separated by newlines, like your code indicates by using split /\n/, or is it separated by commas?). There is probably something else going on, so have a look at the Basic debugging checklist to try and figure out what it might be. It also looks like you're coding without strict and warnings, and while those don't directly influence your current code, it's a very good habit to get into, see Use strict and warnings.

    It would be very helpful if you could provide some more example information, for example what $good_orgs might look like, and some sample values for $organization, and whether your function should accept them or not. The reason I ask is that your requirements are unclear: If the list of allowed words contains "legal", then your current implementation will also accept an organization named "Illegal Exports".

    Note that if $good_orgs contains strings that should be matched literally, and not regular expressions, you should be using /\Q$good_orgs\E/ as explained in, for example, Mind the meta!.

    I'm going to wager a guess that $organization may only contain "allowed" words. In that case, here's one way of several that you could do something like that. Note that I'm also making some more assumptions, like that $organization is a comma and/or whitespace separated list, like you seem to be hinting.

    use warnings; use strict; my $good_orgs = "First\nNational\nBank\nCompany\nLLC\nlegal"; my %good_orgs = map {lc()=>1} split /\n/, $good_orgs; sub is_good_org { my $organization = shift; my @parts = grep {length} split /[\s;,]+/, $organization; for my $org (@parts) { return unless exists $good_orgs{lc $org}; } return 1; } print is_good_org("LLC") ?"yes":"no", "\n"; # "yes" print is_good_org("Company, LLC") ?"yes":"no", "\n"; # "yes" print is_good_org("National Bank") ?"yes":"no", "\n"; # "yes" print is_good_org("legal company") ?"yes":"no", "\n"; # "yes" print is_good_org("National Bunk") ?"yes":"no", "\n"; # "no" print is_good_org("Illegal Exports")?"yes":"no", "\n"; # "no" print is_good_org("Test LLC") ?"yes":"no", "\n"; # "no"

    Note that despite my use of lc, there are some strange cases in Unicode where that isn't actually a case-insensitive comparison. Newer versions of Perl offer the fc function if that's a concern to you.

    Updated my code with your example input from here. Update 2: Added the grep {length} in case split leaves some empty elements behind, as would be the case in is_good_org(" LLC").

    Hope this helps,
    -- Hauke D

Re: Creating a whitelist
by stevieb (Canon) on Jun 15, 2016 at 20:56 UTC

    First, you should be using use strict; and use warnings;. Second you shouldn't be quoting integers:

    my $found = 0; ... if ($found == 0){... # or better: if (! $found){... # or for truth: if ($found){...

    Now, the way you have things set up, won't do an exact match, so if someone states an org as "hi", you'd match on "chia", "chitchat" etc.

    It's very hard to tell what's failing without you specifying some examples of what people are sending in, and the contents of your @orgs array.

    In fact, after re-reading your post, we'll absolutely need to know what the contents of both are... do the words in the dictionary contain special chars such as commas, periods etc. Does the $organization coming in? If the dictionary has one word per line and you're receiving things like Company, LLC, you'll have to split the $organization up, clean it if necessary, then change the logic. Depending on the above, the following example may not (or probably won't) work.

    Please provide us with some more details.

    ...

    I'd recommend grep for this job. That way, your @orgs don't get split into things like "Company," and "LLC". The whole "Company, LLC" will have to match, exactly:

    use warnings; use strict; my $organization = "thre"; my @orgs = qw (one two three four); error_suspect_org() if ! grep {$organization eq $_} @orgs; sub error_suspect_org { die "not a valid org!\n"; }
      Thanks for the great response. The database has a list of words on each line like, First, National, Bank. If someone enters "National Bank" it triggers the error even though both words are on the list. Point taken on quoting integers. I get the whitelist from the database like this:
      $pointer7 = $sth7->fetchrow_hashref; $good_orgs = $pointer7->{'good_orgs'}; @orgs = split /\n/,$good_orgs;
      I tried your code and it still triggers the error if the org has 2 words from the list in it. Hmm...
        The database has a list of words on each line like, First, National, Bank. If someone enters "National Bank" it triggers the error even though both words are on the list.
        "National Bank" does not match "First, National, Bank" because of the comma. Try stripping out the commas:
        $_ =~ s/,//g for @orgs;
Re: Creating a whitelist
by duyet (Friar) on Jun 16, 2016 at 13:31 UTC
    You can also use an hash to hold the list and then check on the hash key, ie.
    my %hash; @hash{ @orgs } = (1) x @orgs; error_suspect_org() unless defined $hash{ $organization };
    See Slices doc for the hash slice.
      Using defined here is not needed. If you are sure you set the values to ones, just check the value is true, or just create the keys with no values, and check existence:
      # Truth @hash{@orgs} = (1) x @orgs; error_suspect_org() unless $hash{$organization}; # Existence undef @hash{@orgs}; error_suspect_org() unless exists $hash{$organization};

      ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,