Line by line buffered read

muyprofesional has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, I need to parse and process a very big file. What i need is use buffered read for speed (sysread). My problem is retrieving the lines after the read: the buffer stops in the middle of a line -obvious-, but:

#!/usr/bin/perl -w
open my( $fh ), '<', "/usr/local/ffpde/logs/pruebas3.log";
my $buffer;
while (sysread $fh, $buffer, 100) {
        my @lines = split(/"\n"/, $buffer);
        print @lines;
        sleep 1;
}
[download]

#!/usr/bin/perl -w
open my( $fh ), '<', "/usr/local/ffpde/logs/pruebas3.log";
my $buffer;
while (sysread $fh, $buffer, 100) {
        my @lines = split(/"\n"/, $buffer);

        for $l (@lines) {
                print "$l\n";
                sleep 1;
        }

}
[download]

Jul 26 10:45:25 - Sergio, 33 | Informático
Jul 26 11:45:25 - Angel, 23 | Encofrador
Jul 26 12:45:25
- Sergio, 52 | Repartidor
Jul 26 12:55:25 - Sergio, 18 | Repartidor
Jul 26 13:25:25 - Angel, 42 | P
anadero
Jul 26 13:35:25 - Dario, 34 | Informático
Jul 26 15:45:25 - Luis, 26 | Repartidor
Jul 26 16
:25:25 - Mabel, 41 | Azafata
Jul 26 17:29:25 - Laura, 19 | Investigadora
Jul 26 10:45:25 - Sergio, 3
3 | Informático
Jul 26 11:45:25 - Angel, 23 | Encofrador
Jul 26 12:45:25 - Sergio, 52 | Repartidor
Jul 26 12:55:25 - Sergio, 18 | Repartidor
Jul 26 13:25:25 - Angel, 42 | Panadero
Jul 26 13:35:25 - D
ario, 34 | Informático

It splits the line where the buffer stopped. (oops)

The buffer size doesn't matter, just for the example to see 2-3 lines of read, its the same with 4096 bytes.

So: is there any method to avoid this cut-line fact when reading with buffers? Which is the best method to load a array with the line-by-line correct content of the file. I must use buffers for speed, simple open lacks of speed for me.

Thanks in advance monks!

Comment on Line by line buffered read Select or Download Code

Replies are listed 'Best First'.
Re: Line by line buffered read by JavaFan (Canon) on Aug 20, 2010 at 16:09 UTC
Why are you reading using sysread? Why can't you just read line by line? If it's the sleeps you need, just count the characters you've read, and if it exceeds 4096 (or some other number), reset your counter and sleep. I don't understand the "I must use buffers for speed".	[reply]
Re: Line by line buffered read by BrimBorium (Friar) on Aug 20, 2010 at 16:34 UTC
I usually use something like: `use strict; use warnings; open (IN,"<file"); my $line; while($line=<IN>){ do_something_with_line(); } close(IN);` [download] probably nobody advised you to read "How do I post a question effectively?", so I do, especially about "Use strict and warnings".	[reply] [d/l]
Re^2: Line by line buffered read by dasgar (Priest) on Aug 20, 2010 at 16:47 UTC
That's kind of what I was thinking of, but I would prefer to modify the open statement to include a die statement, such as below: `open (IN,"<",$file) \|\| die "Unable to open file '$file': $!\n";` [download] Also, I would think that the sleep statements in the OP are actually "slowing" it down by making it run longer. Based on the code provided, it doesn't look like the sleep statements are needed. Of course, since I have never used sysread, I may be completely wrong about this.	[reply] [d/l]
Re^3: Line by line buffered read by muyprofesional (Initiate) on Aug 20, 2010 at 16:55 UTC
Thansk for de "or die", the sleep is just for show best the ouptput each time i print, in a big file, else, u can't show it instead.	[reply]
Re^2: Line by line buffered read by muyprofesional (Initiate) on Aug 20, 2010 at 16:58 UTC
Thanks. I need buffered read for speed.	[reply]
Re^3: Line by line buffered read by ikegami (Patriarch) on Aug 20, 2010 at 17:06 UTC
You're not making any sense. Line by line reading (`while (<>)`) is buffered. `sysread`, on the other hand, provides no buffering. If you want to provide your own buffering instead of using Perl's, you could do `my $buf = ''; for (;;) { my $rv = sysread($fh, $buf, BLOCK_SIZE, length($buf)); die("sysread: $!") if !defined($rv); last if !$rv; process_line($1) while s/^([^\n]\n)//; } process_line($buf) if length($buf);` [download] Update*: Fixed problem mentioned by ibm1620 in comment.	[reply] [d/l] [select]
Re^4: Line by line buffered read by ibm1620 (Hermit) on Feb 25, 2014 at 00:37 UTC
Re: Line by line buffered read by roboticus (Chancellor) on Aug 20, 2010 at 23:32 UTC
muyprofesional: You're optimizing too soon! First make your code work, then if it's not fast enough, profile it to find out what's too slow. Then, and only then, make it work fast. If you did this, you would've simply used normal line-by-line entry. Then you wouldn't have started down this trail, since the normal line-by-line file reading is already buffered and fast. Until you know what's "slow", making something faster is a waste of time. For example if you have a program that's too slow, and the file reading is taking 5% of your time, then improving the file reading will get you a 5% speed increase at best! You'd profit more by speeding up whatever is consuming the other 95% of your time... ...roboticus	[reply]
Re: Line by line buffered read by MajingaZ (Beadle) on Aug 20, 2010 at 18:28 UTC
But I it appears you aren't actually doing anything with the file? Why not just copying it? Or if you just want to work with the text and don't need to parse the lines of data, there is no need to split the `$buffer`. You're splitting on \n then basically putting them back in with the print? Solutions depend on what you are actually doing with the `$buffer` You could for example do a `join "\n",@lines` though I think you'll have to check to see if `$buffer` ended in `/\n\z/` Or you could just do a `$buffer.=<$fh>` keeping in mind that if you combine this with the previous that you'll want to output a terminating `\n` Or was this just to try and focus on the command you think is the problem? I have killed a couple servers when using while(<>) because of all the calls to the server, got yelled at by some IT peeps cause I was making some many calls to the server for lines of data.	[reply] [d/l] [select]
Re: Line by line buffered read by rowdog (Curate) on Aug 22, 2010 at 20:56 UTC
As Ikegami pointed out, sysread is unbuffered io and I see no reason to use it here. It's much easier to use the buffered io functions and here's a simple revision of your example code that does just that. `#!/usr/bin/perl use strict; use warnings; my $fname = '/var/log/Xorg.0.log'; open my $fh, '<', $fname or die $!; while ( my $line = <$fh> ) { print $line; sleep 1; }` [download]	[reply] [d/l]