in reply to Fetch Mysql huge database
For bulk processing like this, I'd sidestep DBI and use command lines tools.
Create a file called: getnames.sql containing:
select UserName from table where UserName <> '' and UserName not like +'deleted%'
And two perl scripts. 1) partNames.pl:
#! perl -w use strict; my %fhs; while( <> ) { my $char = substr $_, 0, 1; if( not exists $fhs{ $char } ) { open $fhs{ $char }, '>', "names.$char" or die $!; } print { $fhs{ $char } } $_; }
And 2) uniqNames. pl:
#! perl -w use strict; for my $fname ( glob 'names.*' ) { my %uniq; open my $fh, '<', $fname or die $!; while( <$fh> ) { print unless exists $uniq{ $_ }; $uniq{ $_ } = 1; } }
Then run the following pair of commands:
mysql --quick < getnames.sql | perl partNames.pl perl uniqNames.pl > uniqedNames.txt
The first line dumps the names and pipes them to the perl script that then writes them to a set of files (called names.?, where ? is the first character of the name).
The second script reads those files one at a time and writes the unique names to stdout where they are directed into a file called uniqedNames.txt.
The whole (untested) process should take less time than sorting the 34 million names -- inside the DB or with your system sort utility. I'd hazard a guess that it would take less than 20 minutes.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Fetch Mysql huge database
by locked_user ym_chaitu (Initiate) on Sep 29, 2011 at 11:05 UTC | |
by BrowserUk (Patriarch) on Sep 29, 2011 at 11:19 UTC | |
by locked_user ym_chaitu (Initiate) on Sep 30, 2011 at 05:31 UTC | |
by BrowserUk (Patriarch) on Sep 30, 2011 at 07:59 UTC |