in reply to Better way to work with large TSV files?
(update: for that matter, might be faster/easier to put the 50 GB on a USB or firewire disk and fedex it to remote host... )
If you really do need to do all those row inserts over the WAN link, and if you are confident about knowing the difference between success and failure for each insert, then my first notion would be:
insertion_script < TSV.file > log.file
You can simply repeat this as many times as necessary, simply using the appropriate value for N each time (how many lines in the log at present), until the log file has the same line count as the TSV file.tail +N TSV.file | insertion_script >> log.file
(In case you don't know "tail", it's a basic unix util, which could easily be emulated in perl as follows:
this just handles the "+N" usage of tail, which is all you need here; there's probably a one-liner form to do the same thing...)#!/usr/bin/perl if ( @ARGV and $ARGV[0] =~ /^+(\d+)/ ) { $bgn = $1; shift; } while (<>) { print if $. >= $bgn; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Better way to work with large TSV files?
by radiantmatrix (Parson) on Aug 23, 2004 at 18:27 UTC |