http://qs1969.pair.com?node_id=702817


in reply to Extracting data from a messy file (slow performance)

A fast code skeleton to get you started.
You've gotta do the proper file I/O, rounding and error handling to make this robust.
Best regards / allan
#!/usr/bin/perl -w use strict; my $int; my $vmstat; my $total; while ( <DATA> ) { # 1. The script needs to scan thru the file until it finds the dat +e line containing # either GMT or SAST entries. The hour and minute needs to be stor +ed to a varible. # i.e. int=17:00 $int = $1 if ( /(\d{2}:\d{2}:\d{2})\s+(GMT|SAST)/ ); # 2.Scan futher down the file until the text "vmstat 2 60" is foun +d. # This line show data will follow. if ( /^\s*vmstat/ ) { # There will be 60 lines of actual stats, which needs to be adde +d together # and divided by 60 to provide an average. $total = 0; for my $i (1..3) { # 3 x 20 # 3. Ignore two heading lines, that each contain text "procs" + and "avm" respectively <DATA> =~ /procs/ or die "Input error $_\n"; <DATA> =~ /avm/ or die "Input error $_\n"; # 4. Hereafter 20 lines of data follow. # I need to to extract columb 16 and 17 - and add them togeth +er. for my $i (1..20) { (<DATA> =~ / (\d+\s+){15}(\d+)\s+(\d+) /); my ($us, $sy) = ($2, $3); my $sum = $us + $sy; $total += $sum; print "$i:\t$us, $sy, $sum, $total, ", $total/60, "\n"; } # 5. I would now like to write this as a record to a file. my ($h, $m, $s) = split /:/, $int; $s += $total / 60; print "***$int ", join (':', $h,$m,$s); } } } __DATA__ ----other lines to be ignored---- Wed Jul 23 17:00:00 GMT 2008 (to extract only hour & minute. 17:00) ----other lines to be ignored---- ----other lines to be ignored---- ----other lines to be ignored---- vmstat 2 60: (to ignore, but states starting point of data) procs memory page + faults cpu r b w avm free re at pi po fr de +sr in sy cs us sy id 2 0 0 517725 4545 15 1 4 3 1 0 1 +38 2389 1783 213 3 2 95 2 0 0 517725 5675 111 3 292 374 73 0 12 +900 2946 7497 327 11 10 78 2 0 0 517725 5669 71 1 188 239 46 0 82 +56 2035 4983 246 10 0 89 1 0 0 544051 5478 58 0 130 152 28 0 52 +83 1502 3696 202 0 2 98 1 0 0 544051 5477 36 0 84 96 17 0 33 +80 1116 2453 155 0 0 100 1 0 0 544051 5515 28 0 55 60 10 0 21 +63 884 1785 132 0 1 99 1 0 0 544051 5477 33 2 36 38 6 0 13 +84 741 2877 131 2 4 94 1 0 0 544051 5539 22 0 23 24 3 0 8 +85 625 1965 110 0 0 100 1 1 0 522972 5539 13 0 15 15 1 0 5 +66 551 1318 96 0 0 100 1 1 0 522972 5539 8 0 9 9 0 0 3 +61 500 966 89 12 0 88 1 1 0 522972 5535 20 1 11 5 0 0 2 +30 487 1059 103 0 1 98 1 1 0 522972 5535 20 1 8 3 0 0 1 +47 473 2430 99 2 3 95 1 1 0 522972 5514 21 0 14 1 0 0 +93 467 2225 99 1 0 99 1 1 0 385532 5023 82 1 28 0 0 0 +59 480 1760 147 5 7 88 1 1 0 385532 3745 70 0 54 0 0 0 +37 1734 2142 282 21 5 74 1 1 0 385532 5479 112 0 87 0 0 0 +23 1503 2859 331 4 8 88 1 1 0 385532 5407 86 1 58 0 0 0 +14 1557 3889 302 3 6 91 1 1 0 385532 5407 55 0 37 0 0 0 + 8 1153 2650 220 0 0 100 1 1 0 434602 5407 35 0 23 0 0 0 + 4 894 1795 167 0 0 100 1 1 0 434602 5407 22 0 14 0 0 0 + 2 725 1208 131 0 0 100 procs memory page + faults cpu r b w avm free re at pi po fr de +sr in sy cs us sy id 1 1 0 434602 5390 84 0 74 0 0 0 + 0 1321 1672 178 7 10 83 1 1 0 434602 5389 63 1 48 0 0 0 + 0 1245 2951 172 2 4 95 1 1 0 434602 5389 40 0 31 0 0 0 + 0 951 1982 135 0 0 100 1 1 0 370995 5389 25 0 19 0 0 0 + 0 766 1361 112 0 0 100 1 1 0 370995 4561 109 0 70 0 0 0 + 0 1125 1626 138 10 13 76 1 1 0 370995 5381 140 0 84 0 0 0 + 0 1906 4289 197 5 5 90 1 1 0 370995 5381 99 1 54 0 0 0 + 0 1468 4622 168 3 2 95 1 1 0 370995 5381 64 0 35 0 0 0 + 0 1105 3187 142 2 0 98 1 1 0 460130 5377 40 0 23 0 0 0 + 0 866 2177 127 0 0 100 1 1 0 460130 5378 117 0 65 0 0 0 + 0 819 2229 139 6 9 85 1 1 0 460130 5377 74 0 42 0 0 0 + 0 964 1564 145 0 0 100 1 1 0 460130 5377 47 0 26 0 0 0 + 0 776 1049 120 2 3 95 1 1 0 460130 5377 38 0 17 0 0 0 + 0 666 2198 111 0 0 100 1 1 0 491926 5377 24 0 11 0 0 0 + 0 580 1510 97 4 2 95 1 1 0 491926 5377 89 0 48 0 0 0 + 0 989 1686 150 1 7 91 1 1 0 491926 5377 56 0 31 0 0 0 + 0 789 1162 122 0 0 100 1 1 0 491926 5377 35 0 20 0 0 0 + 0 660 842 106 3 2 94 1 1 0 491926 5377 30 0 14 0 0 0 + 0 579 2037 100 2 1 96 2 0 0 327196 5378 93 0 50 0 0 0 + 0 973 2086 156 2 8 89 2 0 0 327196 5377 59 0 32 0 0 0 + 0 776 1426 126 0 0 100 procs memory page + faults cpu r b w avm free re at pi po fr de +sr in sy cs us sy id 2 0 0 327196 5377 37 0 20 0 0 0 + 0 650 965 106 0 0 100 2 0 0 327196 5377 23 0 13 0 0 0 + 0 566 693 92 4 4 92 2 0 0 327196 5377 97 1 50 0 0 0 + 0 978 2673 159 3 10 87 1 1 0 251674 5377 62 0 32 0 0 0 + 0 783 1801 136 0 0 100 1 1 0 251674 5377 39 0 21 0 0 0 + 0 655 1259 112 0 0 100 1 1 0 251674 5369 24 0 15 0 0 0 + 0 580 894 100 1 0 98 1 1 0 251674 5168 186 0 103 0 0 0 + 0 909 1955 152 9 13 78 1 1 0 251674 5420 130 1 67 0 0 0 + 0 776 3148 142 2 3 95 1 1 0 370259 5420 83 0 43 0 0 0 + 0 654 2105 119 0 0 100 1 1 0 370259 5382 57 0 27 0 0 0 + 0 602 1550 108 0 2 98 1 1 0 370259 5428 39 1 17 0 0 0 + 0 552 1183 102 1 1 98 1 1 0 370259 5428 29 1 11 0 0 0 + 0 507 1013 96 0 0 100 1 1 0 370259 5383 33 1 6 0 0 0 + 0 483 2661 102 3 4 93 1 1 0 466781 5428 24 1 4 0 0 0 + 0 581 1944 130 0 0 100 1 1 0 466781 5428 16 0 2 0 0 0 + 0 523 1337 107 0 0 100 1 1 0 466781 5423 9 0 3 0 0 0 + 0 487 909 93 0 1 99 1 1 0 466781 5397 11 0 1 0 0 0 + 0 505 823 91 0 0 100 1 1 0 466781 5395 30 3 2 0 0 0 + 0 515 2958 118 4 5 90 1 1 0 514735 5394 19 1 2 0 0 0 + 0 482 2044 116 0 0 100 1 1 0 514735 5394 12 0 1 0 0 0 + 0 466 1406 100 0 0 100 END

Replies are listed 'Best First'.
Re^2: Extracting data from a messy file (slow performance)
by acidblood (Novice) on Aug 07, 2008 at 09:00 UTC
    Hi All!

    Thank you very much for all your assistance! I'll be working on a perl script to get my processing done.

    This will be possible all thanks to all of you!

    I'll post the script when it's complete and working.

    Kind Regards,

    Acidblood

Re^2: Extracting data from a messy file (slow performance)
by acidblood (Novice) on Aug 08, 2008 at 12:33 UTC
    Thank you ALL! - Especially to Allan!

    Here is my final working code if anyone was interested...

    #!/usr/bin/perl -w # Description: Custom script to strip stats from a messy file. # HP VERSION use strict; my $int; my $vmstat; my $total; my $sfile = pop or carp("Usage: strip.pl [file]"); my $ofile = "$sfile.out"; open ODATA,">$ofile" or carp("Can't open $ofile for writing"); open DATA,"<$sfile" or carp("Can't open $sfile"); while ( <DATA> ) { # 1. The script needs to scan thru the file until it finds the dat +e line containing # either GMT or SAST entries. The hour and minute needs to be stor +ed to a varible. # i.e. int=17:00 $int = $1 if ( /(\d{2}:\d{2}:\d{2})\s+(GMT|SAST)/ ); #$int = $1 if ( /(\d{2}:\d{2})\s+(GMT|SAST)/ ); # 2.Scan futher down the file until the text "vmstat 2 60" is foun +d. # This line show data will follow. if ( /^\s*vmstat 2 60/ ) { # There will be 60 lines of actual stats, which needs to be adde +d together # and divided by 60 to provide an average. $total = 0; for my $i (1..3) { # 3 x 20 # 3. Ignore two heading lines, that each contain text "procs" + and "avm" respectively <DATA> =~ /procs/ or die "Input error $_\n"; <DATA> =~ /avm/ or die "Input error $_\n"; # 4. Hereafter 20 lines of data follow. # I need to to extract columb 16 and 17 - and add them togeth +er. for my $i (1..20) { (<DATA> =~ / (\d+\s+){15}(\d+)\s+(\d+) /); my ($us, $sy) = ($2, $3); my $sum = $us + $sy; $total += $sum; #print "$i:\t$us, $sy, $sum, $total, ", $total/60, "\n"; } # 5. I would now like to write this as a record to a file. #my ($h, $m, $s) = split /:/, $int; #$s += $total / 60; #print "***$int ", join (':', $h,$m,$s); } #print "$i:\t$us, $sy, $sum, $total, ", $total/60, "\n"; my ($h, $m, $s) = split /:/, $int; $s += $total / 60; #print ODATA "\n***$int -> ", join (':', $h,$m,$s); print ODATA "$h:$m,$s\n"; } } close DATA; close ODATA;

    Great to know all of you!

    Kind regards,

    Acidblood

      acidblood,
      I have not read this thread. I just happened to see this node and something caught my eye - <DATA>.

      You probaly should avoid using <DATA> for a number of reasons:

      • Same filehandle as __DATA__ - see SelfLoader
      • Should use a lexical file handle - see open
      • Should be using 3 arg open - see open

      Cheers - L~R

        Thanks! I'll change it! ;-)