Re^4: How to check lines that start with the same word then delete one of them

Thank you for your answer. The problem is that I've a big file (40000 line), that's why I did'nt bring it to my post. and also I can't specify all the wanted lines, I just want to check if for every line there is another line witch have the same first word then I check it to delete one of them, that's why I did two loops. but the algorithm is so slow

Comment on Re^4: How to check lines that start with the same word then delete one of them

Replies are listed 'Best First'.
Re^5: How to check lines that start with the same word then delete one of them by hippo (Archbishop) on Apr 10, 2020 at 13:15 UTC
The problem is that I've a big file (40000 line), that's why I did'nt bring it to my post. Everyone is glad you didn't. "Sample data" means just that. Pick maybe 6 lines - enough to be illustrative and to cover the bases. My script above is a test only. It illustrates that the algorithm and the code works, given the sample data. I just want to check if for every line there is another line witch have the same first word then I check it to delete one of them, This is precisely what my test shows. Would you not agree? To turn the test into a working script just replace `@in` with the code you already have which reads the input data from the file and similarly write `@have` to your file at the end. that's why I did two loops. but the algorithm is so slow Your algorithm is O(n²) whereas mine is O(n). Mine should therefore be thousands of times faster for a 40,000 line dataset. See also: Big O notation, SSCCE and Basic Testing Tutorial. HTH.	[reply] [d/l] [select]
Re^6: How to check lines that start with the same word then delete one of them by rsFalse (Chaplain) on Apr 10, 2020 at 15:13 UTC
I think your solution works in O(nlog n), because searching an item in a hash takes log n. Am I right? Still much much faster that O(nn) :)	[reply]
Re^7: How to check lines that start with the same word then delete one of them by Laurent_R (Canon) on Apr 10, 2020 at 15:42 UTC
No, a hash lookup is usually O(1), it does take some time, sure, but the time it takes does not depend (generally) on the hash size.	[reply]