Why are you using readline to read a file that doesn't contain lines?
Well actually, readline can be configured to effectively become a read by setting $/ to a reference to a number. For example,
local $/ = \4096;
while (<$fh>) {
# $_ contains one 4k chunk
...
}
| [reply] [d/l] [select] |
The file does contain lines. However, there is a chunk of chr(0) that has been appended (by a process outside my control) to the front of the first line, and I didn't realize that there was an INSANE quantity of chr(0) until I was able to demonstrate it just a bit ago. So, this has exposed the next layer of the problem. I now know why readline() was choking (and why I thought that read() was choking even though it really wasn't, though that's not relevant). So, what's the best way to quickly breeze past the chr(0) mess -- now at 385MB and counting -- to find the start of the real data?
| [reply] |
Use block mode to filter out the NULs first.
perl -pe'BEGIN { $/ = \(64*1024); } s/\0+//g' infile | line_reading_sc
+ript.pl
| [reply] [d/l] |
You are probably trying to read the whole file into memory.
Its best if you show a small self contained program which replicates the problem. | [reply] |
Sorry, I forgot to include a snippet because my wife was rushing me out the door. ;-) No, I'm not reading the whole file into an array; I'm using scalar context to read in one line at a time:
...
open (INPUTFILE, $inputfile) or die ("\nERROR: Unable to open file
+\"$inputfile\".\n");
while (!eof(INPUTFILE))
{
$line = readline(INPUTFILE);
...
And that's where it chokes. If I insert a print statement before and after the readline (as checkpoints), the first one will work, and the second one will not.
I don't have experience using read, and when I tried replacing the readline with a simple read earlier, It looked like the same hang was happening. However, when I tried it again just now, it worked fine:
read(INPUTFILE, $x, 1);
Since I knew (from MUCH smaller files) that the input files would have chunks of chr(0) at the front, I wrote a bit of code around the read statement above to see just how bad the situation is for this particularly large file. As I write this message, we're at 120MB worth of continuous chr(0) and counting... | [reply] [d/l] [select] |