I'm designing a portal that allows a user to configure a URL and some metadata associated with it in their user account, such as depth, maximum colors, follow-offsite links, and so on.

This portal stores URLs with this metadata in several tables. One of these contains "keywords" for each URL in the system (roughly 886 URLs thus far). As the user enters a new URL not in the system, they can enter keywords in that record, so others searching, can find it. The list of keywords is comma-separated in the URL entry form.

My question is, how do I determine that a list of keywords contains duplicates, and remove them, or prompt the user to remove them? For example:

my @keywords = ('this','that','other','foo','this', 'bar'); ^dupe

Obviously this contains a dupe 'this', which should be noted and removed, or pointed out to the user.

Another example:

my @keywords = ('this','that','other','foo','THIS', 'bar'); ^uc(dupe)

Also a dupe, 'THIS', though uppercase.

I initially thought lowercasing each word parsed from the list passed, and walking the list, comparing each to the word prior to it, but I don't think that will work for longer lists of keywords.

Has anyone done this before? Any insight as to how this should be designed?

Update:

This code now appears to do what I want, thanks to all who have helped arrive at a solution.

use strict; my (@keywords, %keywords); @keywords = ('this', 'that', 'other', 'foo', 'THIS', 'bar'); @keywords{map lc,@keywords}=(); @keywords = sort keys %keywords; foreach my $word (@keywords) { print $word . "\n"; }

Also, thanks to castaway on CB, this is also in 'perldoc -q duplicate', How can I remove duplicate elements from a list or array?


In reply to Detecting duplicate keywords passed in a form by hacker

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.