Situation is the folloowing : Memory is low and data is large. What I need is an algorithm to locate repetitive enteries with a prespecified cutoff (let say 5 repeats) in a file that containes 70000000 string enteries (file example:
) I only have app. 500MB of RAM at my disposal (other things are running and I cannot afford to SWAP). Does anyone has a suggestion ?ent1 up to 100 characters.. ent2 up to 100 characters.. ent3 up to 100 characters.. ent2 up to 100 characters.. ent7 up to 100 characters.. ent3 up to 100 characters.. ent5 up to 100 characters.. ent5 up to 100 characters.. ent2 up to 100 characters.. ..
b
PS Don't care if it is slow as long as it can be processed within 6 hours.
In reply to [SOLVED]finding repetitions by baxy77bax
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |