Re^2: Extracting data from a messy file (slow performance)

Thank you ALL! - Especially to Allan!

Here is my final working code if anyone was interested...

 
#!/usr/bin/perl -w
# Description: Custom script to strip stats from a messy file.
# HP VERSION
use strict;

my $int;
my $vmstat;
my $total;

my $sfile = pop or carp("Usage: strip.pl [file]");

my $ofile = "$sfile.out";
open ODATA,">$ofile" or carp("Can't open $ofile for writing");

open DATA,"<$sfile" or carp("Can't open $sfile");
while ( <DATA> ) {
    # 1. The script needs to scan thru the file until it finds the dat
+e line containing
    # either GMT or SAST entries. The hour and minute needs to be stor
+ed to a varible.
    # i.e. int=17:00
    $int = $1 if ( /(\d{2}:\d{2}:\d{2})\s+(GMT|SAST)/ );
    #$int = $1 if ( /(\d{2}:\d{2})\s+(GMT|SAST)/ );
    
    # 2.Scan futher down the file until the text "vmstat 2 60" is foun
+d.
    # This line show data will follow.
    if ( /^\s*vmstat 2 60/ ) {
      
      # There will be 60 lines of actual stats, which needs to be adde
+d together
      # and divided by 60 to provide an average.
      $total = 0;
      for my $i (1..3) {   # 3 x 20
    
         # 3. Ignore two heading lines, that each contain text "procs"
+ and "avm" respectively
         <DATA> =~ /procs/ or die "Input error $_\n";
         <DATA> =~ /avm/ or die "Input error $_\n"; 
         # 4. Hereafter 20 lines of data follow.
         # I need to to extract columb 16 and 17 - and add them togeth
+er.
         for my $i (1..20) {
            (<DATA> =~ / (\d+\s+){15}(\d+)\s+(\d+) /);
            my ($us, $sy) = ($2, $3);
            my $sum = $us + $sy;
            $total += $sum;
            #print "$i:\t$us, $sy, $sum, $total, ", $total/60, "\n";
        
         }
         # 5. I would now like to write this as a record to a file.
         #my ($h, $m, $s) = split /:/, $int;
         #$s += $total / 60;
         #print "***$int ", join (':', $h,$m,$s);
      }
            #print "$i:\t$us, $sy, $sum, $total, ", $total/60, "\n";
         my ($h, $m, $s) = split /:/, $int;
         $s += $total / 60;
         #print ODATA "\n***$int -> ", join (':', $h,$m,$s);
         print ODATA "$h:$m,$s\n";
   }
}

close DATA;
close ODATA;
[download]

Great to know all of you!

Kind regards,

Acidblood

Comment on Re^2: Extracting data from a messy file (slow performance) Download Code

Replies are listed 'Best First'.
Re^3: Extracting data from a messy file (slow performance) by Limbic~Region (Chancellor) on Aug 08, 2008 at 15:43 UTC
acidblood, I have not read this thread. I just happened to see this node and something caught my eye - <DATA>. You probaly should avoid using <DATA> for a number of reasons: Same filehandle as __DATA__ - see SelfLoader Should use a lexical file handle - see open Should be using 3 arg open - see open Cheers - L~R	[reply]
Re^4: Extracting data from a messy file (slow performance) by acidblood (Novice) on Aug 09, 2008 at 20:35 UTC
Thanks! I'll change it! ;-)	[reply]


Syntactic Confectionery Delight
	PerlMonks