in reply to Re^2: Reading a huge input line in parts
in thread Reading a huge input line in parts
It would be a short and easy program to write, esp. as a stdin-stdout filter: it's just a while loop that reads a nice size char buffer (say, a few MB at a time), and steps through the buffer one character at a time, accumulating consecutive digit characters, and outputting the string of digits every time you encounter a non-digit character. It wouldn't be more than 20 lines of C code, if that, and you'll save a lot of run-time.
I suppose there must be more to your overall process than just splitting into digit strings; you could still do that extra part of your process in perl, but have the perl script read from the output of the C program. (But again, given the quantity of data, if the other stuff can be done in C without too much trouble, I'd do that.)
UPDATE: Okay, I admit I was wrong about how many lines of C it would take. This C program is 26 30 lines (not counting the 4 blank lines added for legibility):
#include <stdio.h> #define BUFSIZE 5242880 int main( argc, argv ) int argc; char *argv[]; { char buffer[BUFSIZE], digitstr[64]; char *bufptr, *numptr; int nread, i; numptr = digitstr; while (( nread = fread( buffer, 1, BUFSIZE, stdin )) > 0 ) { bufptr = buffer; i = 0; while ( i < nread ) { if ( *bufptr >= 0x30 && *bufptr <= 0x39 ) *numptr++ = *bufptr; else if ( numptr > digitstr ) { *numptr = 0; printf( "%s\n", digitstr ); numptr = digitstr; } bufptr++; i++; } } /* update: need this list bit in case last char in the stream is a dig +it */ if ( numptr > digitstr ) { *numptr = 0; printf( "%s\n", digitstr ); } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Reading a huge input line in parts
by kroach (Pilgrim) on May 05, 2015 at 20:32 UTC | |
by Anonymous Monk on May 05, 2015 at 21:48 UTC |