repudi8or has asked for the wisdom of the Perl Monks concerning the following question:

Hi Folks,

In need of some help please.

Scenario:- I have a csv file with many lines of url filter strings (may include asterisks eg "http://*.google.com*/*"). I read each line, want to add it to a list of some form (array, hash, whatever) to be used in a loop later. I only want strings added to this list if they dont already exist in there. Of the thousands of filters i read in, there may only be 5 or 10 unique ones. I want to create this list as efficiently as possible. I am using Text::Csv::Slurp to read in the csv.

input file sample:-

parm1,http://*.google.com*/*,parm3 parm1,http://*.google.com*/*,parm3 parm1,http://*.yahoo.com*/*,parm3 parm1,http://*.google.com*/*,parm3 parm1,http://*.gmail.com*/*,parm3 parm1,http://*.yahoo.com*/*,parm3 parm1,http://*.google.com*/*,parm3

Desired output list in whatever format lets me easily loop through later (array, hash, whatever) "http://*.google.com*/*", "http://*.yahoo.com*/*", "http://*.gmail.com*/*"

Type of code i have tried :-

if (grep {$time_url_hits->[$i]->{url}} @url_list){ #print "found \n"; } else { print "adding url " . $time_url_hits->[$i]->{'url'} . " to url arr +ay \n"; push (@url_list,$time_url_hits->[$i]->{'url'}); ... <other processing> ... <then later> foreach (@url_list) { <do stuff for each unique filter string>

for some reason it only adds the very first filter string in the array. Im guessing this is to do with the asterisks and grep somehow. Anyway, any assistance on achieving this elegantly would be appreciated

Regards Rep

Replies are listed 'Best First'.
Re: ensuring unique elements in a list
by BrowserUk (Patriarch) on Mar 05, 2010 at 06:27 UTC

    Strip out the bit you want and put it in a hash. Then put the decoration back when you use it.

    #! perl -slw use strict; my %filters; @filters{ ( split ',' )[1] =~ m[//\*\.([^*]+)\*] } = 1 while <DATA> ; printf "http://*.%s*/*\n", $_ for keys %filters; =output C:\test>826863 http://*.google.com*/* http://*.gmail.com*/* http://*.yahoo.com*/* =cut __DATA__ parm1,http://*.google.com*/*,parm3 parm1,http://*.google.com*/*,parm3 parm1,http://*.yahoo.com*/*,parm3 parm1,http://*.google.com*/*,parm3 parm1,http://*.gmail.com*/*,parm3 parm1,http://*.yahoo.com*/*,parm3 parm1,http://*.google.com*/*,parm3

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.