TIURIC has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: Thanks again no Time Out No Duplicates
by ysth (Canon) on Feb 06, 2004 at 18:11 UTC
    Update: archaeological note: The OP's previous questions are Removing duplicates in large files and End of the Time Out Tireny.

    If (as you've said before) your code is crawling or running out of memory sorting only 120000 email addresses, you've got a problem with your code. If you will please show what you've tried instead of just asking again, you will get better answers. (Update: my hypothesis that memory shouldn't be an issue is predicated on the knowledge that this is running on a web server.)

    Also, some evidence that you've tried something like some of the solutions proposed would be appreciated. Why don't you give the solutions in Re: Removing duplicates in large files (a hash, or divide-and-conquer) a try (starting with the first and moving down)?

Re: Thanks again no Time Out No Duplicates
by Theo (Priest) on Feb 06, 2004 at 18:14 UTC
    Does this node refer back to a previous one? If so, please provide an Update pointing to it.

    Normally folks post their code when asking how to do it better since, without it, we can't tell how you did it. If the code is longish, use the readmore tags or put it on your scratchpad.

    -Theo-
    (so many nodes and so little time ... )

      ...or put it on your scratchpad.

      Please don't suggest using scratchpads. By the time the node is discovered by a user with a similar question, chances are most likely the scratchpad will no longer contain the information. Anything relating to the question should remain in the thread contents itself.

        I think I've made the scratchpad suggestion before, but with the mental intent that if it actually came about I would be obliged to extract the useful part of what was there and put it back in the question thread.
Re: Thanks again no Time Out No Duplicates
by Anonymous Monk on Feb 06, 2004 at 18:15 UTC
    Didn't you read the replies to your last thread? Create a hash, and add a key for each item in your list. Then use keys %hash to get your list of unique items. If you need to sort it, you can also remove nonunique items very easily from a sorted list with:
    my ( @res, $last ); foreach my $item ( @sorted ) { next if ( defined $last and $item eq $last ); push @res, $item; $last = $item; };
    This has the side effect of preserving the order of the list.
      To be fair, some of the answers to his original question were somewhat off or presented as "the one true solution" when they were not all that certain to be applicable.

      But there was enough there that he could have tried and then asked further saying what had/hadn't worked.

Re: Thanks again no Time Out No Duplicates
by graff (Chancellor) on Feb 09, 2004 at 01:37 UTC
    Although the method I have devised to search for duplicates, does indeed remove all duplicates. It is slow and is probably not the best way to do it.

    No one (including you) will ever know if there is a better way to do it until you show us how you are doing it now. Post something that looks like code and indicates what you are doing.

    Is there such a thing as a sort list without duplicates function usable on a windows system?? I'm sure this can be done in UNIX. Thanks again you guys are most skilled in programming balance.
    On any unix system, you would do a command line like this:
    sort -u file.txt > sorted-uniq-file.txt
    There are at least a few good sources (Cygwin, GNU, ATT Research Labs) where you can get comprehensive kits that port all the basic unix command-line utilities -- not just sort, but also ls, find, cut, paste, grep, awk, tar ... and most important, the bash shell -- for use on any MS-Windows system (including source code and gcc compiler, if you're into that sort of thing).