Greetings fellow Monks!
I've searched in perlmonks and google and couldn't find a solution. I'm still very new to perl, so a lot of the commands are new to me...
I have a text file that basically contains stats and I need to extract specific data from it. I have written a shell script, which works, however it takes 2 minutes to extract 1 record.
I'm running Cygwin on top of Vista - I know not exactly what you hoped - but this is what I have right now.
I think the performance problem is due to me grep'ing each line and checking it as well as cygwin.
The data:
----other lines to be ignored----
Wed Jul 23 17:00:00 GMT 2008 (to extract only hour & minute. 17:00)
----other lines to be ignored----
----other lines to be ignored----
----other lines to be ignored----
vmstat 2 60: (to ignore, but states starting point of data)
----2 other lines to be ignored----
----20 lines of data----
----2 other lines to be ignored----
----20 lines of data----
----2 other lines to be ignored----
----20 lines of data----
END (the characters END shows that collection was complete)
*The above data repeats 96 times at different intervals. Thus every 15 minutes in a day.
Explanation of the data I need:
1. The script needs to scan thru the file until it finds the date line containing either GMT or SAST entries. The hour and minute needs to be stored to a varible. i.e. int=17:00
2.Scan futher down the file until the text "vmstat 2 60" is found. This line show data will follow.
3. Ignore two heading lines, that each contain text "procs" and "avm" respectively
4. Hereafter 20 lines of data follow. I need to to extract columb 16 and 17 - and add them together.
If this was one of my data lines:
2 0 0 517725 4545 15 1 4 3 1 0 1
+38 2389 1783 213 3 2 95
I would want to add 3 and 2 to give me a value of 5.
There will be 60 lines of actual stats, which needs to be added together and divided by 60 to provide an average.
5. I would now like to write this as a record to a file.
Output should look like this:
17:00,5
My file will contain 96 records per day
Here follows an example of data from vmstat:
vmstat 2 60:
procs memory page
+ faults cpu
r b w avm free re at pi po fr de
+sr in sy cs us sy id
2 0 0 517725 4545 15 1 4 3 1 0 1
+38 2389 1783 213 3 2 95
2 0 0 517725 5675 111 3 292 374 73 0 12
+900 2946 7497 327 11 10 78
2 0 0 517725 5669 71 1 188 239 46 0 82
+56 2035 4983 246 10 0 89
1 0 0 544051 5478 58 0 130 152 28 0 52
+83 1502 3696 202 0 2 98
1 0 0 544051 5477 36 0 84 96 17 0 33
+80 1116 2453 155 0 0 100
1 0 0 544051 5515 28 0 55 60 10 0 21
+63 884 1785 132 0 1 99
1 0 0 544051 5477 33 2 36 38 6 0 13
+84 741 2877 131 2 4 94
1 0 0 544051 5539 22 0 23 24 3 0 8
+85 625 1965 110 0 0 100
1 1 0 522972 5539 13 0 15 15 1 0 5
+66 551 1318 96 0 0 100
1 1 0 522972 5539 8 0 9 9 0 0 3
+61 500 966 89 12 0 88
1 1 0 522972 5535 20 1 11 5 0 0 2
+30 487 1059 103 0 1 98
1 1 0 522972 5535 20 1 8 3 0 0 1
+47 473 2430 99 2 3 95
1 1 0 522972 5514 21 0 14 1 0 0
+93 467 2225 99 1 0 99
1 1 0 385532 5023 82 1 28 0 0 0
+59 480 1760 147 5 7 88
1 1 0 385532 3745 70 0 54 0 0 0
+37 1734 2142 282 21 5 74
1 1 0 385532 5479 112 0 87 0 0 0
+23 1503 2859 331 4 8 88
1 1 0 385532 5407 86 1 58 0 0 0
+14 1557 3889 302 3 6 91
1 1 0 385532 5407 55 0 37 0 0 0
+ 8 1153 2650 220 0 0 100
1 1 0 434602 5407 35 0 23 0 0 0
+ 4 894 1795 167 0 0 100
1 1 0 434602 5407 22 0 14 0 0 0
+ 2 725 1208 131 0 0 100
procs memory page
+ faults cpu
r b w avm free re at pi po fr de
+sr in sy cs us sy id
1 1 0 434602 5390 84 0 74 0 0 0
+ 0 1321 1672 178 7 10 83
1 1 0 434602 5389 63 1 48 0 0 0
+ 0 1245 2951 172 2 4 95
1 1 0 434602 5389 40 0 31 0 0 0
+ 0 951 1982 135 0 0 100
1 1 0 370995 5389 25 0 19 0 0 0
+ 0 766 1361 112 0 0 100
1 1 0 370995 4561 109 0 70 0 0 0
+ 0 1125 1626 138 10 13 76
1 1 0 370995 5381 140 0 84 0 0 0
+ 0 1906 4289 197 5 5 90
1 1 0 370995 5381 99 1 54 0 0 0
+ 0 1468 4622 168 3 2 95
1 1 0 370995 5381 64 0 35 0 0 0
+ 0 1105 3187 142 2 0 98
1 1 0 460130 5377 40 0 23 0 0 0
+ 0 866 2177 127 0 0 100
1 1 0 460130 5378 117 0 65 0 0 0
+ 0 819 2229 139 6 9 85
1 1 0 460130 5377 74 0 42 0 0 0
+ 0 964 1564 145 0 0 100
1 1 0 460130 5377 47 0 26 0 0 0
+ 0 776 1049 120 2 3 95
1 1 0 460130 5377 38 0 17 0 0 0
+ 0 666 2198 111 0 0 100
1 1 0 491926 5377 24 0 11 0 0 0
+ 0 580 1510 97 4 2 95
1 1 0 491926 5377 89 0 48 0 0 0
+ 0 989 1686 150 1 7 91
1 1 0 491926 5377 56 0 31 0 0 0
+ 0 789 1162 122 0 0 100
1 1 0 491926 5377 35 0 20 0 0 0
+ 0 660 842 106 3 2 94
1 1 0 491926 5377 30 0 14 0 0 0
+ 0 579 2037 100 2 1 96
2 0 0 327196 5378 93 0 50 0 0 0
+ 0 973 2086 156 2 8 89
2 0 0 327196 5377 59 0 32 0 0 0
+ 0 776 1426 126 0 0 100
procs memory page
+ faults cpu
r b w avm free re at pi po fr de
+sr in sy cs us sy id
2 0 0 327196 5377 37 0 20 0 0 0
+ 0 650 965 106 0 0 100
2 0 0 327196 5377 23 0 13 0 0 0
+ 0 566 693 92 4 4 92
2 0 0 327196 5377 97 1 50 0 0 0
+ 0 978 2673 159 3 10 87
1 1 0 251674 5377 62 0 32 0 0 0
+ 0 783 1801 136 0 0 100
1 1 0 251674 5377 39 0 21 0 0 0
+ 0 655 1259 112 0 0 100
1 1 0 251674 5369 24 0 15 0 0 0
+ 0 580 894 100 1 0 98
1 1 0 251674 5168 186 0 103 0 0 0
+ 0 909 1955 152 9 13 78
1 1 0 251674 5420 130 1 67 0 0 0
+ 0 776 3148 142 2 3 95
1 1 0 370259 5420 83 0 43 0 0 0
+ 0 654 2105 119 0 0 100
1 1 0 370259 5382 57 0 27 0 0 0
+ 0 602 1550 108 0 2 98
1 1 0 370259 5428 39 1 17 0 0 0
+ 0 552 1183 102 1 1 98
1 1 0 370259 5428 29 1 11 0 0 0
+ 0 507 1013 96 0 0 100
1 1 0 370259 5383 33 1 6 0 0 0
+ 0 483 2661 102 3 4 93
1 1 0 466781 5428 24 1 4 0 0 0
+ 0 581 1944 130 0 0 100
1 1 0 466781 5428 16 0 2 0 0 0
+ 0 523 1337 107 0 0 100
1 1 0 466781 5423 9 0 3 0 0 0
+ 0 487 909 93 0 1 99
1 1 0 466781 5397 11 0 1 0 0 0
+ 0 505 823 91 0 0 100
1 1 0 466781 5395 30 3 2 0 0 0
+ 0 515 2958 118 4 5 90
1 1 0 514735 5394 19 1 2 0 0 0
+ 0 482 2044 116 0 0 100
1 1 0 514735 5394 12 0 1 0 0 0
+ 0 466 1406 100 0 0 100
END
6. The text "END" shows the end of data. Hereafter we can search for the next data again.
Here is my shell script:
# This file is used to breakup the original file into useful informati
+on
#
# File format:
# int,cpu%
>$1.out
stat=0
cpu=0
scpu=0
rec=0
echo "Starting rebuild of $1 into $1.out"
while read line
do
# Get date
if [ `echo $line | egrep 'GMT|SAST' | wc -l` -eq 1 ]
then
int=`echo $line|cut -c12-16`
fi
# Get vmstat 2 60 data
if [ `echo $line | grep "vmstat 2 60" | wc -l` -eq 1 ]
then
stat=1
fi
# If stat=1 - entered into stat data
if [ $stat -eq 1 ] && [ `echo $line | egrep 'vmstat|procs|avm|END' | w
+c -l` -eq 0 ]
then
scpu=`echo $line | awk '{ print $16 "+" $17 }'|bc`
cpu=`expr $cpu + $scpu`
fi
# END of data string
if [ `echo $line | grep END | wc -l` -eq 1 ]
then
cpu=`expr $cpu / 60`
# Write data line
echo "$int,$cpu" >>$1.out
stat=0
int=0
cpu=0
scpu=0
rec=`expr $rec + 1`
echo "`date`:Wrote record: $rec"
fi
done < $1
echo "Complete!"
Is there anyone that can help?
I have a couple of files to do. According to my calculations, 41.6 hours to do all the files for one host. I have 9 to do, which gives me about 15.6 days??!!
Regards,
Acidblood
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.