The problem is that I've a big file (40000 line), that's why I did'nt bring it to my post.
Everyone is glad you didn't. "Sample data" means just that. Pick maybe 6 lines - enough to be illustrative and to cover the bases. My script above is a test only. It illustrates that the algorithm and the code works, given the sample data.
I just want to check if for every line there is another line witch have the same first word then I check it to delete one of them,
This is precisely what my test shows. Would you not agree? To turn the test into a working script just replace @in with the code you already have which reads the input data from the file and similarly write @have to your file at the end.
that's why I did two loops. but the algorithm is so slow
Your algorithm is O(n2) whereas mine is O(n). Mine should therefore be thousands of times faster for a 40,000 line dataset.
See also: Big O notation, SSCCE and Basic Testing Tutorial. HTH.
|