dbrock has asked for the wisdom of the Perl Monks concerning the following question:

Can I place a FILE into an @array and parse it for information..? Would it be faster than just parsing a file..? I am starting with somthing like this code sample listed below...


Thank's Darrick...

$logfile = 'c:\\path\\file.name'; # Ascii txt file

open(FILE, "$logfile"); # open logfile for reading
   @txtfile = <FILE>;
  close(FILE);
   foreach $txtline(@txtfile) {
##### Get Server Variable #####
   if ($txtline =~ /\bJob server:/) {
    chomp ($server = $txtline);
   $server =~ s/Job server: //;
   }
##### Get Client Variable ####
   if ($txtline =~ /\bJob name:/) {
    chomp ($client = $txtline);
    $client =~ s/Job name: //;
    $client =~ s/ -.*//;
   }
 }
print "$client is backed up on $server"; # just an example but you get the point

listed Below is a small example of the txt file I am parsing

======================================================================
Job server: Some_Server
Job name: Some_Client - INC
Job started: Monday, March 10, 2003 at 9:30:02 PM
Job type: Backup
Log file: BEX78.txt
======================================================================

Replies are listed 'Best First'.
Re: File to @array then parse like a file
by runrig (Abbot) on Mar 14, 2003 at 22:47 UTC
    Reading from an array would be faster, but getting the data into the array is not going to save you any time (just the opposite). It would be simpler just to do something like this:
    while (<FILE>) { next unless /^Job (name|server): (.*)/; $server = $2, next if $1 eq 'server'; print "$2 is on $server; }
Re: File to @array then parse like a file
by Desdinova (Friar) on Mar 14, 2003 at 23:04 UTC
    As I understand it a I/O read is a I/O read and will take the same time whether it goes into an array or is processed line by line. The only advantages I can see to using the array approach are
    1) You have to pass over the file more that once (which doesn't seem to be your case)
    2) You are locking the file and want to reduce the amount of time that it is locked for, the script would take the same time, but the file would be availble to another process faster.

    On another note if you are looking to squeeze every second out, your second and subsequent if statements should be elsif() statements, that way one the match is made for a given line the rest of the tests get skipped. This of course assumes that each test is exclusive.

    PS the scary part to me is that I recognze the file format you are trying to parse
      I could be wrong, but I disagree. Slurping a file into an array or a scalar can certainly be faster than parsing the file line by line. This is because there are consecutive system calls. Changing $/ can increase speed without slurping the whole file into memory as a happy medium. For instance, reading in a 100 MB file that only has 10 characters per line 1 line at a time would certainly be slower than reading it in 64K chunks.

      In dbrock's example. It certainly does seem like a waste to slurp the file. The point is not to slurp the file for premature optimization unless there is a valid reason to do so.

      Cheers - L~R

      UPDATE: If the file is extremely large, but you only need the top part of it, you can use last to end the loop once you have all the data that you need. I am guessing that this may be the reason for wanting to speed things up.

      UPDATE 2: Setting $/ = \65536; does in deed change how much of the file is read by buffer. The other factor that slows things down by iterating by newlines is the stuff in between (data munging). It has to be performed more times than if you are working with a larger data block. Thanks to chromatic for keeping me on my toes and runrig for clearing up some confusion in the CB.

        I'm not aware of any operating system that deals with "lines" on a file level. Unix-wise, it's all just a stream of characters. Perl-wise, unless you use sysread, you'll get buffered reads, so you'll only hit the disk when you've exhausted the buffer, not every time you want a new line.

        There may be a filesystem out there that does work with lines, not characters. To my knowledge, Perl doesn't do anything different there.

        Update: I forgot to mention block device buffering, or buffering in the disk controller.

Re: File to @array then parse like a file
by blaze (Friar) on Mar 14, 2003 at 23:39 UTC
    You could do something like this:
    #!/usr/bin/perl -w use strict; my $file = 'c:\\path\\file.name'; open FILE, "$file" or die "Couldnt open file:$!\n"; my @info = <FILE>; close FILE; my ($server,$client); for(@info){ chomp; if(/Job server:+[\W]+(.*)/){ $server = $1; } if(/Job name:+[\W]+(.*)/){ $client = $1; $client =~ s/-.*//; print "$client is on $server\n"; } }
    -Robert