Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I seem to be making a mess of using sysread. I haven't been using perl long so I thought I'd write a simple routine which manipulates a file. The file is tab delimited with 8 columns
field1 field2 field3 field4 field5 field6 field7 field8
field6 and field7 are dates with this format. e.g.
Aug 11 2010 1:40PM
All I want to do is to get this file into an array so I can manipulate it. I tried doing this
local *IN; open(IN, "/var/tmp/myfile.dat") or die "Can't read /var/tmp/myfile.dat +: $! \n"; binmode(IN); my $buf_e = ''; my $buf_d = ''; my $BLOCK_SIZE = 8192; my @recs; # Shove the contents of file into an array while (sysread(IN, $buf_e, $BLOCK_SIZE, length($buf_e))) { push(@recs, [ $1, $2, $3, $4, $5, $6, $7, $8 ]) while ($buf_d =~ s/^(\S+)\t+(\S+)\t+(\S+)\t+(\S+)\t+(\S+)\t+(\S+ +)\t+(\S+)\t+(\S+)\n//s); } close(IN); print @recs;
But it's not working. No output is produced and I can't figure out why. Any help would be gratefully received

Replies are listed 'Best First'.
Re: sysread failure
by Fletch (Bishop) on Aug 11, 2010 at 13:26 UTC

    I'm going to go out on a limb and guess it's because you read into $buf_e then try and parse things from $buf_d.

    That aside, if the file's columns are separated by tabs then reading line by line and using Text::CSV_XS or even split would make more sense. Using sysread for line oriented input isn't the most obvious solution.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: sysread failure
by talexb (Chancellor) on Aug 11, 2010 at 13:57 UTC

    To follow up on what brother Fletch said, sysread is really only something you'll need to use when munging a binary file. For any line-oriented file, just open the file and read it with open and the diamond operator while(<>){ .. }. And then close it, of course.

    And don't worry that you have to read 8K blocks to get decent throughput -- the part of Perl that reads lines from files has had (from what I've heard) a great deal of attention paid to it over the last twenty or so years, and is probably about as fast as it could possibly be.

    Also, seeing a repeated pattern in a regular expression should be a sign that there's probably a better way. Again, brother Fletch has suggested split. Try it -- I think you'll like it.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

      sysread is really only something you'll need to use when munging a binary file.

      read is really only something you'll need to use when munging a binary file.

      sysread is only needed when you need unbuffered IO (e.g. when you use select) or when you want partial reads from pipes and sockets.

      the part of Perl that reads lines from files has had (from what I've heard) a great deal of attention paid to it over the last twenty or so years

      Actually, I heard it's quite slow, in part due to the minuscule 4k buffer. I'm not saying that reading 8k chunks and breaking them down into lines on the user side is any faster.

        sysread ne sysopen