Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

OK. I am a new newbie and I have been tasked with doing some data parsing for a file. I have two versions of the script. Neither seem to work. I am trying to copy data that contains the last numeric term from all stations that contain PM2.5 data but only the set (there is always two for each variable) that has 24 hourly averages not 13 average time periods. I have included the complete file that I would have to parse. I have also included both scripts labeled script 1 & 2 below. Please help!!

This is what the output file should look like

PM2.5 21-9-2010 22:4:49 (dd-mm-yyyy hrs:min:sec) KA5 4 OV20 10 DH1 2 PA16 8 MV17 0 HL11 3 KN12 17 PC 4 KH19 0 SI2 8

This is what the input file looks like: BEGIN_FILE FORMAT_VERSION,2 AGENCY,HI1 FILENAME,090913.HI1 DATA_VERSION,201009091310 TZONE,HST,10 BEGIN_GROUP VARIABLE,CO DATA_TYPE,POINT MEASUREMENT_TYPE,SAMPLE CHARACTERISTIC,OBSERVED START_DTG,201009080000 END_DTG,201009082359 INTERVAL,60 START_REF,0 NUMSTEPS,24 AVG_TIME,60 UNITS,PPM STATIONS,2 BEGIN_DATA KA5,150030010,0.2,0.2,0.2,0.2,0.2,0.2,-999,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2 KA5,150030010,G,G,G,G,G,G,B,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G DH1,150031001,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.6,0.6,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0. 5,0.5,0.5 DH1,150031001,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G END_DATA END_GROUP BEGIN_GROUP VARIABLE,CO DATA_TYPE,POINT MEASUREMENT_TYPE,SAMPLE CHARACTERISTIC,OBSERVED START_DTG,201009090000 END_DTG,201009091309 INTERVAL,60 START_REF,0 NUMSTEPS,13 AVG_TIME,60 UNITS,PPM STATIONS,2 BEGIN_DATA KA5,150030010,0.2,0.2,0.2,0.2,0.2,0.2,-999,0.3,0.2,0.2,0.2,0.2,0.2 KA5,150030010,G,G,G,G,G,G,B,G,G,G,G,G,G DH1,150031001,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5 DH1,150031001,G,G,G,G,G,G,G,G,G,G,G,G,G END_DATA END_GROUP BEGIN_GROUP VARIABLE,NO2 DATA_TYPE,POINT MEASUREMENT_TYPE,SAMPLE CHARACTERISTIC,OBSERVED START_DTG,201009080000 END_DTG,201009082359 INTERVAL,60 START_REF,0 NUMSTEPS,24 AVG_TIME,60 UNITS,PPM STATIONS,2 BEGIN_DATA KA5,150030010,0.001,0,0,0,0.001,0.004,-999,0.004,0.005,0.003,0.003,0.002,0.002,0.002,0.002,0.002,0.001,0.002,0.001,0.002,0.002,0.002,0.001, 0 KA5,150030010,G,G,G,G,G,G,B,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G WB6,150030011,0,0,0,0,0,0,-999,0,0.002,0.002,0,0.001,0,0,0,0,0,0,0,0,0,0,0,0 WB6,150030011,G,G,G,G,G,G,B,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G END_DATA END_GROUP BEGIN_GROUP VARIABLE,NO2 DATA_TYPE,POINT MEASUREMENT_TYPE,SAMPLE CHARACTERISTIC,OBSERVED START_DTG,201009090000 END_DTG,201009091309 INTERVAL,60 START_REF,0 NUMSTEPS,13 AVG_TIME,60 UNITS,PPM STATIONS,2 BEGIN_DATA KA5,150030010,0,0,0.001,0.001,0.003,0.011,-999,0.009,0.004,0.002,0.002,0.002,0.003 KA5,150030010,G,G,G,G,G,G,B,G,G,G,G,G,G WB6,150030011,0,0,0,0,0,0.002,-999,0.005,0.002,0,0.001,0,0 WB6,150030011,G,G,G,G,G,G,B,G,G,G,G,G,G END_DATA END_GROUP BEGIN_GROUP VARIABLE,OZONE DATA_TYPE,POINT MEASUREMENT_TYPE,SAMPLE CHARACTERISTIC,OBSERVED START_DTG,201009080000 END_DTG,201009082359 INTERVAL,60 START_REF,0 NUMSTEPS,24 AVG_TIME,60 UNITS,PPM STATIONS,1 BEGIN_DATA SI2,150031004,0.013,0.014,0.013,0.013,0.013,0.01,0.009,0.007,0.011,-999,0.021,0.019,0.019,0.018,0.019,0.017,0.018,0.018,0.016,0.009,0.014,0.017,0.017,0.017 SI2,150031004,G,G,G,G,G,G,G,G,G,K,G,G,G,G,G,G,G,G,G,G,G,G,G,G END_DATA END_GROUP BEGIN_GROUP VARIABLE,OZONE DATA_TYPE,POINT MEASUREMENT_TYPE,SAMPLE CHARACTERISTIC,OBSERVED START_DTG,201009090000 END_DTG,201009091309 INTERVAL,60 START_REF,0 NUMSTEPS,13 AVG_TIME,60 UNITS,PPM STATIONS,1 BEGIN_DATA SI2,150031004,0.016,0.017,0.016,0.01,0.014,0.011,0.006,0.009,0.017,0.018,0.02,0.022,0.02 SI2,150031004,G,G,G,G,G,G,G,G,G,G,G,G,G END_DATA END_GROUP BEGIN_GROUP VARIABLE,PM10 DATA_TYPE,POINT MEASUREMENT_TYPE,SAMPLE CHARACTERISTIC,OBSERVED START_DTG,201009080000 END_DTG,201009082359 INTERVAL,60 START_REF,0 NUMSTEPS,24 AVG_TIME,60 UNITS,UG/M3 STATIONS,4 BEGIN_DATA KA5,150030010,3,5,9,7,4,9,11,24,26,28,22,20,13,18,13,18,11,9,7,3,1,2,5,6 KA5,150030010,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G WB6,150030011,3,2,2,6,10,7,3,5,16,9,7,9,16,14,11,8,7,6,7,6,5,4,5,4 WB6,150030011,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G DH1,150031001,10,11,10,7,8,7,8,6,5,5,-999,-999,6,8,8,8,9,9,9,16,10,8,7,7 DH1,150031001,G,G,G,G,G,G,G,G,G,G,B,B,G,G,G,G,G,G,G,G,G,G,G,G PC,150032004,19,10,18,12,8,7,12,24,11,12,12,8,18,10,9,8,8,10,11,11,12,13,15,13 PC,150032004,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G END_DATA END_GROUP BEGIN_GROUP VARIABLE,PM10 DATA_TYPE,POINT MEASUREMENT_TYPE,SAMPLE CHARACTERISTIC,OBSERVED START_DTG,201009090000 END_DTG,201009091309 INTERVAL,60 START_REF,0 NUMSTEPS,13 AVG_TIME,60 UNITS,UG/M3 STATIONS,4 BEGIN_DATA KA5,150030010,7,9,11,9,7,8,22,30,14,26,20,19,15 KA5,150030010,G,G,G,G,G,G,G,G,G,G,G,G,G WB6,150030011,3,8,5,3,6,6,8,10,18,11,9,9,9 WB6,150030011,G,G,G,G,G,G,G,G,G,G,G,G,G DH1,150031001,9,8,5,6,5,5,8,9,8,7,4,-999,5 DH1,150031001,G,G,G,G,G,G,G,G,G,G,G,B,G PC,150032004,12,12,11,11,17,20,13,20,10,8,9,7,-999 PC,150032004,G,G,G,G,G,G,G,G,G,G,G,G,M END_DATA END_GROUP BEGIN_GROUP VARIABLE,PM2.5 DATA_TYPE,POINT MEASUREMENT_TYPE,SAMPLE CHARACTERISTIC,OBSERVED START_DTG,201009080000 END_DTG,201009082359 INTERVAL,60 START_REF,0 NUMSTEPS,24 AVG_TIME,60 UNITS,UG/M3 STATIONS,10 BEGIN_DATA KA5,150030010,0,0,0,0,2,1,0,0,0,3,1,0,1,1,0,5,3,2,3,0,-999,0,0,4 KA5,150030010,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,B,G,G,G OV20,150012020,17,17,18,11,11,6,6,16,9,8,10,13,11,8,7,5,6,6,3,5,6,4,9,10 OV20,150012020,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G DH1,150031001,5,5,3,1,1,4,4,2,2,3,2,1,3,4,3,2,2,5,4,4,5,4,3,2 DH1,150031001,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G PA16,150012016,8,11,7,6,10,8,6,4,5,6,6,5,3,6,6,3,4,4,6,8,6,6,8,8 PA16,150012016,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G MV17,150012017,1,5,3,0,2,1,1,1,4,6,6,4,2,2,4,3,2,1,1,2,3,2,0,0 MV17,150012017,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G HL11,150011006,4,2,-1,0,2,1,2,4,3,2,2,0,-1,0,4,4,2,1,1,3,5,5,2,3 HL11,150011006,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G KN12,150011012,8,10,7,7,9,11,11,7,4,7,8,6,5,5,6,5,7,10,18,15,13,19,20,17 KN12,150011012,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G PC,150032004,2,3,2,0,1,0,0,2,3,4,4,1,1,2,0,0,0,0,2,3,4,3,5,4 PC,150032004,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G KH19,150090006,1,0,0,4,7,4,1,0,0,1,1,1,3,2,6,11,18,5,3,2,2,0,0,0 KH19,150090006,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G SI2,150031004,7,5,4,6,7,6,6,6,6,6,6,6,6,10,8,5,8,9,8,10,13,9,8,8 SI2,150031004,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G END_DATA END_GROUP BEGIN_GROUP VARIABLE,PM2.5 DATA_TYPE,POINT MEASUREMENT_TYPE,SAMPLE CHARACTERISTIC,OBSERVED START_DTG,201009090000 END_DTG,201009091309 INTERVAL,60 START_REF,0 NUMSTEPS,13 AVG_TIME,60 UNITS,UG/M3 STATIONS,10 BEGIN_DATA KA5,150030010,6,2,1,5,1,1,4,1,4,6,3,3,2 KA5,150030010,G,G,G,G,G,G,G,G,G,G,G,G,G OV20,150012020,6,5,11,13,11,11,12,16,17,13,17,21,18 OV20,150012020,G,G,G,G,G,G,G,G,G,G,G,G,G DH1,150031001,5,8,7,2,0,1,3,6,6,4,-999,3,2 DH1,150031001,G,G,G,G,G,G,G,G,G,G,B,G,G PA16,150012016,8,22,19,14,13,15,13,12,11,6,2,3,4 PA16,150012016,G,G,G,G,G,G,G,G,G,G,G,G,G MV17,150012017,3,4,4,4,3,2,0,0,1,3,3,2,3 MV17,150012017,G,G,G,G,G,G,G,G,G,G,G,G,G HL11,150011006,3,1,2,1,0,0,1,2,3,2,1,1,-1 HL11,150011006,G,G,G,G,G,G,G,G,G,G,G,G,G KN12,150011012,12,11,11,10,10,9,10,11,8,8,9,10,11 KN12,150011012,G,G,G,G,G,G,G,G,G,G,G,G,G PC,150032004,1,0,2,3,2,1,4,7,6,2,3,6,-999 PC,150032004,G,G,G,G,G,G,G,G,G,G,G,G,M KH19,150090006,2,0,0,0,1,4,0,1,4,1,0,0,-999 KH19,150090006,G,G,G,G,G,G,G,G,G,G,G,G,M SI2,150031004,6,6,8,8,7,5,7,8,9,6,5,8,8 SI2,150031004,G,G,G,G,G,G,G,G,G,G,G,G,G END_DATA END_GROUP BEGIN_GROUP VARIABLE,SO2 DATA_TYPE,POINT MEASUREMENT_TYPE,SAMPLE CHARACTERISTIC,OBSERVED START_DTG,201009080000 END_DTG,201009082359 INTERVAL,60 START_REF,0 NUMSTEPS,24 AVG_TIME,60 UNITS,PPM STATIONS,9 BEGIN_DATA KA5,150030010,0,0,0,0,0,0,-999,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0,0,0,0,0,0 KA5,150030010,G,G,G,G,G,G,B,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G WB6,150030011,0,0,0,0,0,0,-999,0.001,0.001,0.002,0.001,0.001,0.001,0,0,0,0,0,0,0,0,0,0,0 WB6,150030011,G,G,G,G,G,G,M,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G OV20,150012020,0,0.001,0.001,-0.001,-0.001,-0.002,-0.001,0.002,0.006,0,0.003,0.008,0.001,-0.001,-0.001,-0.001,-0.001,-0.002,-0.002,-0.002,-0.001,0,-0.001,-0.001 OV20,150012020,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G DH1,150031001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0. 001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001 DH1,150031001,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G PA16,150012016,0.087,0.036,0.079,0.13,0.105,0.081,0.102,0.069,0.087,-999,0.007,0.004,0.002,0.001,0.001,0.001,0.001,0.003,0.006,-999,0.013,0.011,0.016,0.053 PA16,150012016,G,G,G,G,G,G,G,G,G,K,G,G,G,G,G,G,G,G,G,B,G,G,G,G MV17,150012017,0.002,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-999,0.004,0.001 MV17,150012017,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,B,G,G HL11,150011006,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0 .001,0.001,0.001,0.001,0.002,-999,0.005,0.002,0.002,0.002 HL11,150011006,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,B,G,G,G,G KN12,150011012,0,0,0,0,0,0,0.001,0.001,0.001,0.001,0.002,0.001,0.001,0.001,0.001,0.001,0.003,0.003,0 .002,0.003,0.004,0.004,0.003,0.002 KN12,150011012,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G PE10,150012010,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0 .001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001 PE10,150012010,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G END_DATA END_GROUP BEGIN_GROUP VARIABLE,SO2 DATA_TYPE,POINT MEASUREMENT_TYPE,SAMPLE CHARACTERISTIC,OBSERVED START_DTG,201009090000 END_DTG,201009091309 INTERVAL,60 START_REF,0 NUMSTEPS,13 AVG_TIME,60 UNITS,PPM STATIONS,9 BEGIN_DATA KA5,150030010,0,0,0,0,0,0,-999,0.002,0.001,0.001,0.001,0.001,0.001 KA5,150030010,G,G,G,G,G,G,B,G,G,G,G,G,G WB6,150030011,0,0,0,0,0,0,-999,0.002,0.001,0.001,0.001,0.001,0.001 WB6,150030011,G,G,G,G,G,G,M,G,G,G,G,G,G OV20,150012020,-0.001,0.055,0.007,0,-0.001,-0.001,-0.001,-0.001,-0.001,0.001,0.013,0.007,0.004 OV20,150012020,G,G,G,G,G,G,G,G,G,G,G,G,G DH1,150031001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001 DH1,150031001,G,G,G,G,G,G,G,G,G,G,G,G,G PA16,150012016,0.16,0.352,0.37,0.328,0.308,0.265,0.224,0.175,0.051,0.008,0.006,0.003,0.002 PA16,150012016,G,G,G,G,G,G,G,G,G,G,G,G,G MV17,150012017,0,0,0,0,0,0,0,0,0,0,0,0,0 MV17,150012017,G,G,G,G,G,G,G,G,G,G,G,G,G HL11,150011006,0.002,0.001,0.001,0.001,0.001,0.001,0.001,0.002,0.002,0.002,0.002,0.002,0.002 HL11,150011006,G,G,G,G,G,G,G,G,G,G,G,G,G KN12,150011012,0.001,0.001,0.001,0.001,0,0,0,0.001,0.001,0.001,0.002,0.003,0.003 KN12,150011012,G,G,G,G,G,G,G,G,G,G,G,G,G PE10,150012010,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001 PE10,150012010,G,G,G,G,G,G,G,G,G,G,G,G,G END_DATA END_GROUP BEGIN_GROUP VARIABLE,WD DATA_TYPE,POINT MEASUREMENT_TYPE,SAMPLE CHARACTERISTIC,OBSERVED START_DTG,201009080000 END_DTG,201009082359 INTERVAL,60 START_REF,0 NUMSTEPS,24 AVG_TIME,60 UNITS,DEGREES STATIONS,11 BEGIN_DATA KA5,150030010,58,66,66,47,43,46,59,44,64,66,66,69,71,78,71,64,62,61,71,66,85,53,49,45 KA5,150030010,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G WB6,150030011,63,69,67,65,59,56,63,65,75,81,84,81,70,68,65,68,66,71,56,63,62,66,64,52 WB6,150030011,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G OV20,150012020,336,8,343,26,19,28,22,267,229,224,249,256,264,288,239,187,125,207,338,350,34,65,19,36 OV20,150012020,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G DH1,150031001,93,-999,-999,-999,-999,-999,62,84,56,48,57,54,54,54,56,54,56,57,61,76,64,68,66,74 DH1,150031001,G,K,K,K,K,K,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G PA16,150012016,325,331,323,303,300,308,294,229,173,-999,106,106,117,113,128,133,180,257,306,267,287,296,328,311 PA16,150012016,G,G,G,G,G,G,G,G,G,K,G,G,G,G,G,G,G,G,G,G,G,G,G,G MV17,150012017,246,249,257,289,263,249,264,275,357,29,37,39,49,36,51,52,52,32,12,274,268,262,259,256 MV17,150012017,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G HL11,150011006,264,260,264,250,271,271,257,279,351,76,68,67,73,70,68,78,60,36,311,284,282,280,274,26 5 HL11,150011006,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G KN12,150011012,78,95,93,107,104,96,104,122,153,168,189,198,205,232,256,260,239,99,71,64,50,46,78,103 KN12,150011012,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G PE10,150012010,322,318,314,314,319,316,315,319,332,358,4,10,22,26,25,32,24,17,358,344,335,329,331,32 8 PE10,150012010,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G PC,150032004,37,0,338,-999,338,355,335,334,63,55,58,50,50,56,47,54,56,49,51,49,51,51,42,41 PC,150032004,G,G,G,K,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G SI2,150031004,100,100,23,18,36,38,23,22,16,21,29,32,32,27,28,25,20,26,35,45,42,36,21,24 SI2,150031004,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G END_DATA END_GROUP BEGIN_GROUP VARIABLE,WD DATA_TYPE,POINT MEASUREMENT_TYPE,SAMPLE CHARACTERISTIC,OBSERVED START_DTG,201009090000 END_DTG,201009091309 INTERVAL,60 START_REF,0 NUMSTEPS,13 AVG_TIME,60 UNITS,DEGREES STATIONS,11 BEGIN_DATA KA5,150030010,37,62,65,61,-999,-999,40,49,62,73,37,65,47 KA5,150030010,G,G,G,G,K,K,G,G,G,G,G,G,G WB6,150030011,57,54,57,50,-999,-999,65,59,83,81,79,78,79 WB6,150030011,G,G,G,G,K,K,G,G,G,G,G,G,G OV20,150012020,26,19,27,24,15,23,15,314,273,228,232,269,288 OV20,150012020,G,G,G,G,G,G,G,G,G,G,G,G,G DH1,150031001,64,79,109,-999,112,114,83,94,117,99,84,66,67 DH1,150031001,G,G,G,K,G,G,G,G,G,G,G,G,G PA16,150012016,311,315,316,316,318,319,306,316,40,99,97,89,93 PA16,150012016,G,G,G,G,G,G,G,G,G,G,G,G,G MV17,150012017,256,248,278,273,255,240,277,20,26,348,96,26,67 MV17,150012017,G,G,G,G,G,G,G,G,G,G,G,G,G HL11,150011006,261,264,265,258,269,268,271,335,83,265,222,76,49 HL11,150011006,G,G,G,G,G,G,G,G,G,G,G,G,G KN12,150011012,-999,100,107,-999,77,89,82,-999,225,226,227,215,239 KN12,150011012,K,G,G,K,G,G,G,K,G,G,G,G,G PE10,150012010,324,329,328,315,314,315,316,330,329,356,38,81,97 PE10,150012010,G,G,G,G,G,G,G,G,G,G,G,G,G PC,150032004,46,67,74,60,-999,-999,52,55,76,69,64,57,-999 PC,150032004,G,G,G,G,K,K,G,G,G,G,G,G,M SI2,150031004,20,51,92,-999,94,107,76,73,77,75,52,34,48 SI2,150031004,G,G,G,K,G,G,G,G,G,G,G,G,G END_DATA END_GROUP BEGIN_GROUP VARIABLE,WS DATA_TYPE,POINT MEASUREMENT_TYPE,SAMPLE CHARACTERISTIC,OBSERVED START_DTG,201009080000 END_DTG,201009082359 INTERVAL,60 START_REF,0 NUMSTEPS,24 AVG_TIME,60 UNITS,M/H STATIONS,11 BEGIN_DATA KA5,150030010,3.4,3.4,3,3.3,3.9,3.4,2.9,4,5.4,6.2,6.3,6.2,6.5,7.3,6.4,6.5,6.3,5.7,5.5,3.4,2,2.5,3.4, 3.9 KA5,150030010,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G WB6,150030011,7,6.6,6.3,5.2,4.9,2.8,4.4,5.3,9.3,10.7,10.5,10.6,9.6,9.7,8.2,8.9,9.3,8.8,6.6,5.1,5,5.9 ,7.4,5.2 WB6,150030011,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G OV20,150012020,1.6,1.9,1.2,3,4.7,4.3,3,1.4,2.8,2.7,3.6,7.2,5.7,5.7,3.4,1.7,4.7,0.7,1.9,1.9,0.8,1.5,4 .3,3.3 OV20,150012020,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G DH1,150031001,1.2,-999,-999,-999,-999,-999,1,1.2,2.1,2.3,3.7,4.9,6.1,6.8,7.1,6.8,5.8,5.7,4.5,2.2,4.2,2.9,3.5,2.2 DH1,150031001,G,K,K,K,K,K,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G PA16,150012016,4.9,4.4,3.1,4.3,4.7,4.6,3,1.4,0.9,-999,10.2,9.2,7.3,7,6.6,4.8,2.8,3.5,2.7,1.2,3.8,4.3,5.7,4.1 PA16,150012016,G,G,G,G,G,G,G,G,G,K,G,G,G,G,G,G,G,G,G,G,G,G,G,G MV17,150012017,2.7,1.7,3.4,1.4,3.7,2,3.6,3,3.2,3.2,3.6,3.9,4.5,4.5,4.8,5.3,4.1,3,1.6,0.9,1.3,1.3,2,2 .9 MV17,150012017,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G HL11,150011006,3.7,3.5,3.2,2.7,4.7,3.4,3.2,2.7,2,3.3,3.5,4.3,6,4.1,5,5.5,2.7,1.8,1.7,2.9,3.9,3.9,3.8 ,3.4 HL11,150011006,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G KN12,150011012,2.9,4.2,5,4.8,4.7,5.9,5.7,4.2,5.1,5.3,6,6.1,5.6,5.4,4.5,2.9,0.9,1.5,2.6,2.2,2.2,2.2,2 .4,1.7 KN12,150011012,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G PE10,150012010,2.9,3.1,3.4,3.8,3.5,4,4,4.8,4.8,4.5,4.6,4.9,4.9,4.9,4.8,4.2,4.8,4.1,2.5,2.3,3.1,4.6,4 .5,5.5 PE10,150012010,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G PC,150032004,2.1,2,2.1,-999,2.6,2.4,2.1,2,4,6.3,6,6.3,7.9,6.5,7.4,7.6,7,7,5.4,5.3,5.3,3.4,4.1,3.2 PC,150032004,G,G,G,K,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G SI2,150031004,2.2,2,1.5,2.4,1.5,1.5,2.4,2.3,2.9,3.4,4.8,5.4,6.6,6.9,7.4,7.6,8,7.2,5.6,2.7,5.4,4.9,3. 9,3.5 SI2,150031004,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G,G END_DATA END_GROUP BEGIN_GROUP VARIABLE,WS DATA_TYPE,POINT MEASUREMENT_TYPE,SAMPLE CHARACTERISTIC,OBSERVED START_DTG,201009090000 END_DTG,201009091309 INTERVAL,60 START_REF,0 NUMSTEPS,13 AVG_TIME,60 UNITS,M/H STATIONS,11 BEGIN_DATA KA5,150030010,3.9,2.3,1.7,1.6,-999,-999,1.4,2.7,4.2,5,5.5,5.5,6.1 KA5,150030010,G,G,G,G,K,K,G,G,G,G,G,G,G WB6,150030011,5.3,3.4,2.7,2.3,-999,-999,1.2,3.2,7,8.4,8.3,7.7,8.4 WB6,150030011,G,G,G,G,K,K,G,G,G,G,G,G,G OV20,150012020,5.5,3.1,3.4,4.6,4.5,3.9,4.8,3.1,3.4,3.2,4.9,5,4.9 OV20,150012020,G,G,G,G,G,G,G,G,G,G,G,G,G DH1,150031001,4.4,2.4,2,-999,1.8,2.5,2.4,2.1,3,2.9,3,4.4,4.2 DH1,150031001,G,G,G,K,G,G,G,G,G,G,G,G,G PA16,150012016,4.5,5.4,5.3,5.1,4.7,5.2,3.6,1.9,1.3,6.8,5,3.9,3.7 PA16,150012016,G,G,G,G,G,G,G,G,G,G,G,G,G MV17,150012017,2.6,2.8,1.8,1.7,2.7,1.9,1.2,1.6,2.3,2.9,2.7,1.9,2.6 MV17,150012017,G,G,G,G,G,G,G,G,G,G,G,G,G HL11,150011006,3.7,3.7,3.7,3.8,2.8,2.6,2.5,1.5,2.6,1.5,1,1.6,1.9 HL11,150011006,G,G,G,G,G,G,G,G,G,G,G,G,G KN12,150011012,-999,2.1,2.6,-999,3.4,3.7,3.8,-999,2,3.2,3.6,4.1,3.9 KN12,150011012,K,G,G,K,G,G,G,K,G,G,G,G,G PE10,150012010,5,3.7,3.6,3.8,4.6,5.1,4,3.5,2.9,2.5,2.4,3.3,3.1 PE10,150012010,G,G,G,G,G,G,G,G,G,G,G,G,G PC,150032004,3.6,3.5,3.3,2,-999,-999,2.6,2.8,4.1,4.9,5.2,6.6,-999 PC,150032004,G,G,G,G,K,K,G,G,G,G,G,G,M SI2,150031004,4.9,1.9,2,-999,2.1,3,4,4.8,5.8,7.3,5.9,5.6,5.2 SI2,150031004,G,G,G,K,G,G,G,G,G,G,G,G,G END_DATA END_GROUP END_FILE

code1
#!/usr/local/bin/perl -w # getting the source code from the file ##################################################################### open(IN,"/home/uila3/rhuff/doh/090809.txt") or die "cannot open file +1 for reading\n"; open(OUT,">/home/uila3/rhuff/doh/test.txt") or die "cannot open file2 +for writing\n"; my $start = 0; my $count=0; while(<IN>) { chomp; $count++; next if( /^\s*$/ ); #skip empty lines if( /^END_DATA\s*$/ ) #end if this word found { $start = 0; next; } if( /VARIABLE,\s*(.*)$/ ) { my ($sec,$min,$hour,$day,$month,$yr19,@rest) = localtime(time); print OUT $_ ; next; } if( ($start==0) && ( /^BEGIN_DATA\s*$/ ) ) #starts with this wor +d only { $start = 1; next; } print OUT $1," ",$2,"\n" if( (($start==1)) && ( /^([^,]*),.*,\s*([0- +9.-]*)$/ ) ); } close(OUT); close(IN); print "No. of lines parsed $count";
In this case, I get a command line that says "No. of lines parsed 463uila%" and an output file that looks like this VARIABLE,CO VARIABLE,CO VARIABLE,NO2 VARIABLE,NO2 VARIABLE,OZONE VARIABLE,OZONE VARIABLE,PM10 VARIABLE,PM10 VARIABLE,PM2.5 VARIABLE,PM2.5 VARIABLE,SO2 VARIABLE,SO2 VARIABLE,WD VARIABLE,WD VARIABLE,WS VARIABLE,WS code 2
# getting the source code from the file my $target_data; { local $/ = "VARIABLE,PM2.5\n"; open my $INFILE, '<', '/home/uila3/rhuff/doh/2010090913.txt' or die "Couldn't open /home/uila3/rhuff/doh/2010090913.txt: $! +"; my $discard = <$INFILE>; $target_data = <$INFILE>; close $INFILE; } print $target_data; print '*' x 20; for my $line (split /\n/, $target_data) { if ($line =~ m{ \A ( \p{Uppercase}{2} \d+ ) , .* , (\d+) }xms ) { print "$1 $2"; } }
This was the response "********************uila% " Nothing else seemed to happen.

20100928 Janitored by Corion: Added readmore tag

Replies are listed 'Best First'.
Re: Data Parsing help for newbie HELP ME!!
by wfsp (Abbot) on Sep 26, 2010 at 11:14 UTC
    Assuming your data looks something like (cut down, lines shortened) It may be worth extracting both PM2.5 records and then determine which you need (and I'm not sure what that is).
    #! /usr/bin/perl use strict; use warnings; use Data::Dumper; my $filename = q{monk.txt}; open my $fh, q{<}, $filename or die qq{cant open *$filename* to read: $!\n}; my (@db, @records); while (my $line = <$fh>){ chomp $line; if ($line =~ /BEGIN_GROUP VARIABLE,PM2\.5/ .. $line =~ /END_GROUP/){ push @records, $line; if ($line =~ /END_GROUP/){ push @db, [@records]; #warn Dumper \@db; @records = (); } } } for my $group (@db){ for my $record (@{$group}){ print qq{$record\n}; # split on comma to get the field # you need? } print q{*} x 10, qq{\n}; }
    Why do one pass when you can do two? :-)
Re: Data Parsing help for newbie HELP ME!!
by Anonymous Monk on Sep 23, 2010 at 22:48 UTC

    You need to reformat your question so the data is readable.

    You need to use strict.

    Note that you're printing when you encounter "VARIABLE", but (juding by the unformatted source of your post) those lines have no numbers or variable data on them.

Re: Data Parsing help for newbie HELP ME!!
by Marshall (Canon) on Sep 26, 2010 at 08:39 UTC
    You definitely should reformat this post. It is not necessary to show some much data, just a couple of records would have been enough. Also, I was not able to ascertain exactly what you wanted in the way of output. Explaining things like what the 10 in "OV20 10" is supposed to count or which of the many date/times in the data that you wanted would be helpful.

    When parsing any data, the first step is to think about the format and what separates the data. Here the format is space separated tokens. Each token has some identifier followed by an optional comma and then comma separated values.

    Examples: UNITS,PPM TZONE,HST,10 BEGIN_FILE DH1,150031001,9,8,5,6,5,5,8,9,8,7,4,-999,5
    The data appears to be very regular and that makes it easy to parse. Don't over complicate things. The first step tokenizer should just split each line into tokens based upon whitespace. Each token can then be split on ",". No fancy regex stuff appears to be required here. Use the easiest tool to get the job done.

    When you see a new measurement variable like CO or NO2, just keep track of that change and print the data if any. It appears that you are counting number of 24 hour measurement days from reporting stations for particular types of measurements, in particular PM 2.5 whatever that means. I don't see any need to pay attention to the start or end of data flags as all that appears to be necessary is to pay attention to the tokens with lots of comma's in them - so just count commas!

    So this does that. I just cut-n-pasted your data into a __DATA__ segment to run my code then chopped most of it off for posting here. Again, I have no idea what date you want - this data has lots and lots of dates and times! My count of stns reporting doesn't agree with your output line, but that is probably because there are extra conditions that you didn't explain.

    #!/usr/bin/perl -w use strict; my $data = <DATA>; my @data = split(/\s+/,$data); #print "$_\n" foreach @data; #run to see what data looks like my %stns; my $variable = undef; foreach my $token (@data) { my @tokens = split(/,/,$token); if ($tokens[0] eq 'VARIABLE') { print_line(); $variable = $tokens[1]; } if ( @tokens > 15) #stations with 24 hour data { $stns{$tokens[0]}++; } } print_line(); #for the last data set sub print_line { return if (!defined($variable)) ; #no data yet return if (!keys %stns); #no 24 point data print "$variable DATE? "; print "$_ $stns{$_} " foreach (sort keys %stns); print "\n"; %stns = (); } =prints CO DATE? DH1 2 KA5 2 NO2 DATE? KA5 2 WB6 2 OZONE DATE? SI2 2 PM10 DATE? DH1 2 KA5 2 PC 2 WB6 2 PM2.5 DATE? DH1 2 HL11 2 KA5 2 KH19 2 KN12 2 MV17 2 OV20 2 PA16 2 PC 2 + SI2 2 SO2 DATE? DH1 2 HL11 2 KA5 2 KN12 2 MV17 2 OV20 2 PA16 2 PE10 2 WB6 2 WD DATE? DH1 2 HL11 2 KA5 2 KN12 2 MV17 2 OV20 2 PA16 2 PC 2 PE10 2 SI +2 2 WB6 2 WS DATE? DH1 2 HL11 2 KA5 2 KN12 2 MV17 2 OV20 2 PA16 2 PC 2 PE10 2 SI +2 2 WB6 2 =cut __DATA__ BEGIN_FILE FORMAT_VERSION,2 AGENCY,HI1 FILENAME,090913.HI1 MORE OF YOUR DATA