in reply to Is this normal for File::Find or where did I go wrong.
Your best bet is to process the file one record at a time.
This still leaves a potential problem: what if you process a 400 MB file with no newlines? Perl will treat the entire 400 MB file as one (big) record. This is probably not what you want to happen.
The solution to this would probably be to read an arbitrary chunk of data, say 256 bytes, from the top of the file and check for a newline and any characters not included in [ -~\r\n\t\f]. If you don't find a newline in the first 256 characters (or whatever limit makes the most sense to you), or if you find characters outside the aforementioned class, chances are you're looking at a file that does not contain text and you should probably print a warning, close the file and move to the next file.
I recently wrote a DOS to UNIX newline conversion script that tries to address these issues. Rather than waste bandwidth, you can see the code at http://dalnet-perl.org/crlflf.txt . This is actually a port of someone else's bash script. The original failed to use sanity checks and ran quite slowly as a result.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Is this normal for File::Find or where did I go wrong.
by illitrit (Friar) on May 03, 2001 at 06:17 UTC |