in reply to Removing repeated lines from file
If that's too big for your memory to hold (is it, really?), you could try to get a unique signature for each line and save that. What about an MD5 hash of it? It's 32 bites per input line, so it could be a good starting point. Beware, MD5 is not that fast if you have billions of lines!%read=(); while(defined($_=<FILE>)){ if(!defined($read_lines{$_})){ print OUTFILE $_; $read{$_}=1; } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Removing repeated lines from file
by zby (Vicar) on Jun 24, 2003 at 12:08 UTC | |
by jmcnamara (Monsignor) on Jun 24, 2003 at 12:25 UTC | |
by ant9000 (Monk) on Jun 24, 2003 at 13:06 UTC | |
|
Re: Removing repeated lines from file
by Abigail-II (Bishop) on Jun 24, 2003 at 12:48 UTC | |
|
Re: Re: Removing repeated lines from file
by husker (Chaplain) on Jun 24, 2003 at 13:57 UTC |