http://qs1969.pair.com?node_id=702721

acidblood has asked for the wisdom of the Perl Monks concerning the following question:

Greetings fellow Monks!

I've searched in perlmonks and google and couldn't find a solution. I'm still very new to perl, so a lot of the commands are new to me...

I have a text file that basically contains stats and I need to extract specific data from it. I have written a shell script, which works, however it takes 2 minutes to extract 1 record.

I'm running Cygwin on top of Vista - I know not exactly what you hoped - but this is what I have right now.

I think the performance problem is due to me grep'ing each line and checking it as well as cygwin.

The data:

----other lines to be ignored----

Wed Jul 23 17:00:00 GMT 2008 (to extract only hour & minute. 17:00)

----other lines to be ignored----

----other lines to be ignored----

----other lines to be ignored----

vmstat 2 60: (to ignore, but states starting point of data)

----2 other lines to be ignored----

----20 lines of data----

----2 other lines to be ignored----

----20 lines of data----

----2 other lines to be ignored----

----20 lines of data----

END (the characters END shows that collection was complete)

*The above data repeats 96 times at different intervals. Thus every 15 minutes in a day.

Explanation of the data I need:

1. The script needs to scan thru the file until it finds the date line containing either GMT or SAST entries. The hour and minute needs to be stored to a varible. i.e. int=17:00

2.Scan futher down the file until the text "vmstat 2 60" is found. This line show data will follow.

3. Ignore two heading lines, that each contain text "procs" and "avm" respectively

4. Hereafter 20 lines of data follow. I need to to extract columb 16 and 17 - and add them together.

If this was one of my data lines:

2 0 0 517725 4545 15 1 4 3 1 0 1 +38 2389 1783 213 3 2 95

I would want to add 3 and 2 to give me a value of 5.

There will be 60 lines of actual stats, which needs to be added together and divided by 60 to provide an average.

5. I would now like to write this as a record to a file.

Output should look like this:

17:00,5

My file will contain 96 records per day

Here follows an example of data from vmstat:

vmstat 2 60: procs memory page + faults cpu r b w avm free re at pi po fr de +sr in sy cs us sy id 2 0 0 517725 4545 15 1 4 3 1 0 1 +38 2389 1783 213 3 2 95 2 0 0 517725 5675 111 3 292 374 73 0 12 +900 2946 7497 327 11 10 78 2 0 0 517725 5669 71 1 188 239 46 0 82 +56 2035 4983 246 10 0 89 1 0 0 544051 5478 58 0 130 152 28 0 52 +83 1502 3696 202 0 2 98 1 0 0 544051 5477 36 0 84 96 17 0 33 +80 1116 2453 155 0 0 100 1 0 0 544051 5515 28 0 55 60 10 0 21 +63 884 1785 132 0 1 99 1 0 0 544051 5477 33 2 36 38 6 0 13 +84 741 2877 131 2 4 94 1 0 0 544051 5539 22 0 23 24 3 0 8 +85 625 1965 110 0 0 100 1 1 0 522972 5539 13 0 15 15 1 0 5 +66 551 1318 96 0 0 100 1 1 0 522972 5539 8 0 9 9 0 0 3 +61 500 966 89 12 0 88 1 1 0 522972 5535 20 1 11 5 0 0 2 +30 487 1059 103 0 1 98 1 1 0 522972 5535 20 1 8 3 0 0 1 +47 473 2430 99 2 3 95 1 1 0 522972 5514 21 0 14 1 0 0 +93 467 2225 99 1 0 99 1 1 0 385532 5023 82 1 28 0 0 0 +59 480 1760 147 5 7 88 1 1 0 385532 3745 70 0 54 0 0 0 +37 1734 2142 282 21 5 74 1 1 0 385532 5479 112 0 87 0 0 0 +23 1503 2859 331 4 8 88 1 1 0 385532 5407 86 1 58 0 0 0 +14 1557 3889 302 3 6 91 1 1 0 385532 5407 55 0 37 0 0 0 + 8 1153 2650 220 0 0 100 1 1 0 434602 5407 35 0 23 0 0 0 + 4 894 1795 167 0 0 100 1 1 0 434602 5407 22 0 14 0 0 0 + 2 725 1208 131 0 0 100 procs memory page + faults cpu r b w avm free re at pi po fr de +sr in sy cs us sy id 1 1 0 434602 5390 84 0 74 0 0 0 + 0 1321 1672 178 7 10 83 1 1 0 434602 5389 63 1 48 0 0 0 + 0 1245 2951 172 2 4 95 1 1 0 434602 5389 40 0 31 0 0 0 + 0 951 1982 135 0 0 100 1 1 0 370995 5389 25 0 19 0 0 0 + 0 766 1361 112 0 0 100 1 1 0 370995 4561 109 0 70 0 0 0 + 0 1125 1626 138 10 13 76 1 1 0 370995 5381 140 0 84 0 0 0 + 0 1906 4289 197 5 5 90 1 1 0 370995 5381 99 1 54 0 0 0 + 0 1468 4622 168 3 2 95 1 1 0 370995 5381 64 0 35 0 0 0 + 0 1105 3187 142 2 0 98 1 1 0 460130 5377 40 0 23 0 0 0 + 0 866 2177 127 0 0 100 1 1 0 460130 5378 117 0 65 0 0 0 + 0 819 2229 139 6 9 85 1 1 0 460130 5377 74 0 42 0 0 0 + 0 964 1564 145 0 0 100 1 1 0 460130 5377 47 0 26 0 0 0 + 0 776 1049 120 2 3 95 1 1 0 460130 5377 38 0 17 0 0 0 + 0 666 2198 111 0 0 100 1 1 0 491926 5377 24 0 11 0 0 0 + 0 580 1510 97 4 2 95 1 1 0 491926 5377 89 0 48 0 0 0 + 0 989 1686 150 1 7 91 1 1 0 491926 5377 56 0 31 0 0 0 + 0 789 1162 122 0 0 100 1 1 0 491926 5377 35 0 20 0 0 0 + 0 660 842 106 3 2 94 1 1 0 491926 5377 30 0 14 0 0 0 + 0 579 2037 100 2 1 96 2 0 0 327196 5378 93 0 50 0 0 0 + 0 973 2086 156 2 8 89 2 0 0 327196 5377 59 0 32 0 0 0 + 0 776 1426 126 0 0 100 procs memory page + faults cpu r b w avm free re at pi po fr de +sr in sy cs us sy id 2 0 0 327196 5377 37 0 20 0 0 0 + 0 650 965 106 0 0 100 2 0 0 327196 5377 23 0 13 0 0 0 + 0 566 693 92 4 4 92 2 0 0 327196 5377 97 1 50 0 0 0 + 0 978 2673 159 3 10 87 1 1 0 251674 5377 62 0 32 0 0 0 + 0 783 1801 136 0 0 100 1 1 0 251674 5377 39 0 21 0 0 0 + 0 655 1259 112 0 0 100 1 1 0 251674 5369 24 0 15 0 0 0 + 0 580 894 100 1 0 98 1 1 0 251674 5168 186 0 103 0 0 0 + 0 909 1955 152 9 13 78 1 1 0 251674 5420 130 1 67 0 0 0 + 0 776 3148 142 2 3 95 1 1 0 370259 5420 83 0 43 0 0 0 + 0 654 2105 119 0 0 100 1 1 0 370259 5382 57 0 27 0 0 0 + 0 602 1550 108 0 2 98 1 1 0 370259 5428 39 1 17 0 0 0 + 0 552 1183 102 1 1 98 1 1 0 370259 5428 29 1 11 0 0 0 + 0 507 1013 96 0 0 100 1 1 0 370259 5383 33 1 6 0 0 0 + 0 483 2661 102 3 4 93 1 1 0 466781 5428 24 1 4 0 0 0 + 0 581 1944 130 0 0 100 1 1 0 466781 5428 16 0 2 0 0 0 + 0 523 1337 107 0 0 100 1 1 0 466781 5423 9 0 3 0 0 0 + 0 487 909 93 0 1 99 1 1 0 466781 5397 11 0 1 0 0 0 + 0 505 823 91 0 0 100 1 1 0 466781 5395 30 3 2 0 0 0 + 0 515 2958 118 4 5 90 1 1 0 514735 5394 19 1 2 0 0 0 + 0 482 2044 116 0 0 100 1 1 0 514735 5394 12 0 1 0 0 0 + 0 466 1406 100 0 0 100 END

6. The text "END" shows the end of data. Hereafter we can search for the next data again.

Here is my shell script:

# This file is used to breakup the original file into useful informati +on # # File format: # int,cpu% >$1.out stat=0 cpu=0 scpu=0 rec=0 echo "Starting rebuild of $1 into $1.out" while read line do # Get date if [ `echo $line | egrep 'GMT|SAST' | wc -l` -eq 1 ] then int=`echo $line|cut -c12-16` fi # Get vmstat 2 60 data if [ `echo $line | grep "vmstat 2 60" | wc -l` -eq 1 ] then stat=1 fi # If stat=1 - entered into stat data if [ $stat -eq 1 ] && [ `echo $line | egrep 'vmstat|procs|avm|END' | w +c -l` -eq 0 ] then scpu=`echo $line | awk '{ print $16 "+" $17 }'|bc` cpu=`expr $cpu + $scpu` fi # END of data string if [ `echo $line | grep END | wc -l` -eq 1 ] then cpu=`expr $cpu / 60` # Write data line echo "$int,$cpu" >>$1.out stat=0 int=0 cpu=0 scpu=0 rec=`expr $rec + 1` echo "`date`:Wrote record: $rec" fi done < $1 echo "Complete!"

Is there anyone that can help?

I have a couple of files to do. According to my calculations, 41.6 hours to do all the files for one host. I have 9 to do, which gives me about 15.6 days??!!

Regards,

Acidblood