Thanks again no Time Out No Duplicates

TIURIC has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Thanks again no Time Out No Duplicates by ysth (Canon) on Feb 06, 2004 at 18:11 UTC
Update: archaeological note: The OP's previous questions are Removing duplicates in large files and End of the Time Out Tireny. If (as you've said before) your code is crawling or running out of memory sorting only 120000 email addresses, you've got a problem with your code. If you will please show what you've tried instead of just asking again, you will get better answers. (Update: my hypothesis that memory shouldn't be an issue is predicated on the knowledge that this is running on a web server.) Also, some evidence that you've tried something like some of the solutions proposed would be appreciated. Why don't you give the solutions in Re: Removing duplicates in large files (a hash, or divide-and-conquer) a try (starting with the first and moving down)?	[reply]
Re: Thanks again no Time Out No Duplicates by Theo (Priest) on Feb 06, 2004 at 18:14 UTC
Does this node refer back to a previous one? If so, please provide an Update pointing to it. Normally folks post their code when asking how to do it better since, without it, we can't tell how you did it. If the code is longish, use the readmore tags or put it on your scratchpad. -Theo- (so many nodes and so little time ... )	[reply]
Re^2: Thanks again no Time Out No Duplicates by Anonymous Monk on Feb 06, 2004 at 20:57 UTC
...or put it on your scratchpad. Please don't suggest using scratchpads. By the time the node is discovered by a user with a similar question, chances are most likely the scratchpad will no longer contain the information. Anything relating to the question should remain in the thread contents itself.	[reply]
Re: Re^2: Thanks again no Time Out No Duplicates by ysth (Canon) on Feb 06, 2004 at 21:06 UTC
I think I've made the scratchpad suggestion before, but with the mental intent that if it actually came about I would be obliged to extract the useful part of what was there and put it back in the question thread.	[reply]
Re: Thanks again no Time Out No Duplicates by Anonymous Monk on Feb 06, 2004 at 18:15 UTC
Didn't you read the replies to your last thread? Create a hash, and add a key for each item in your list. Then use `keys %hash` to get your list of unique items. If you need to sort it, you can also remove nonunique items very easily from a sorted list with: `my ( @res, $last ); foreach my $item ( @sorted ) { next if ( defined $last and $item eq $last ); push @res, $item; $last = $item; };` [download] This has the side effect of preserving the order of the list.	[reply] [d/l] [select]
Re: Re: Thanks again no Time Out No Duplicates by ysth (Canon) on Feb 06, 2004 at 18:39 UTC
To be fair, some of the answers to his original question were somewhat off or presented as "the one true solution" when they were not all that certain to be applicable. But there was enough there that he could have tried and then asked further saying what had/hadn't worked.	[reply]
Re: Thanks again no Time Out No Duplicates by graff (Chancellor) on Feb 09, 2004 at 01:37 UTC
Although the method I have devised to search for duplicates, does indeed remove all duplicates. It is slow and is probably not the best way to do it. No one (including you) will ever know if there is a better way to do it until you show us how you are doing it now. Post something that looks like code and indicates what you are doing. Is there such a thing as a sort list without duplicates function usable on a windows system?? I'm sure this can be done in UNIX. Thanks again you guys are most skilled in programming balance. On any unix system, you would do a command line like this: `sort -u file.txt > sorted-uniq-file.txt` [download] There are at least a few good sources (Cygwin, GNU, ATT Research Labs) where you can get comprehensive kits that port all the basic unix command-line utilities -- not just sort, but also ls, find, cut, paste, grep, awk, tar ... and most important, the bash shell -- for use on any MS-Windows system (including source code and gcc compiler, if you're into that sort of thing).	[reply] [d/l]