in reply to Data Parsing help for newbie HELP ME!!

You definitely should reformat this post. It is not necessary to show some much data, just a couple of records would have been enough. Also, I was not able to ascertain exactly what you wanted in the way of output. Explaining things like what the 10 in "OV20 10" is supposed to count or which of the many date/times in the data that you wanted would be helpful.

When parsing any data, the first step is to think about the format and what separates the data. Here the format is space separated tokens. Each token has some identifier followed by an optional comma and then comma separated values.

Examples: UNITS,PPM TZONE,HST,10 BEGIN_FILE DH1,150031001,9,8,5,6,5,5,8,9,8,7,4,-999,5
The data appears to be very regular and that makes it easy to parse. Don't over complicate things. The first step tokenizer should just split each line into tokens based upon whitespace. Each token can then be split on ",". No fancy regex stuff appears to be required here. Use the easiest tool to get the job done.

When you see a new measurement variable like CO or NO2, just keep track of that change and print the data if any. It appears that you are counting number of 24 hour measurement days from reporting stations for particular types of measurements, in particular PM 2.5 whatever that means. I don't see any need to pay attention to the start or end of data flags as all that appears to be necessary is to pay attention to the tokens with lots of comma's in them - so just count commas!

So this does that. I just cut-n-pasted your data into a __DATA__ segment to run my code then chopped most of it off for posting here. Again, I have no idea what date you want - this data has lots and lots of dates and times! My count of stns reporting doesn't agree with your output line, but that is probably because there are extra conditions that you didn't explain.

#!/usr/bin/perl -w use strict; my $data = <DATA>; my @data = split(/\s+/,$data); #print "$_\n" foreach @data; #run to see what data looks like my %stns; my $variable = undef; foreach my $token (@data) { my @tokens = split(/,/,$token); if ($tokens[0] eq 'VARIABLE') { print_line(); $variable = $tokens[1]; } if ( @tokens > 15) #stations with 24 hour data { $stns{$tokens[0]}++; } } print_line(); #for the last data set sub print_line { return if (!defined($variable)) ; #no data yet return if (!keys %stns); #no 24 point data print "$variable DATE? "; print "$_ $stns{$_} " foreach (sort keys %stns); print "\n"; %stns = (); } =prints CO DATE? DH1 2 KA5 2 NO2 DATE? KA5 2 WB6 2 OZONE DATE? SI2 2 PM10 DATE? DH1 2 KA5 2 PC 2 WB6 2 PM2.5 DATE? DH1 2 HL11 2 KA5 2 KH19 2 KN12 2 MV17 2 OV20 2 PA16 2 PC 2 + SI2 2 SO2 DATE? DH1 2 HL11 2 KA5 2 KN12 2 MV17 2 OV20 2 PA16 2 PE10 2 WB6 2 WD DATE? DH1 2 HL11 2 KA5 2 KN12 2 MV17 2 OV20 2 PA16 2 PC 2 PE10 2 SI +2 2 WB6 2 WS DATE? DH1 2 HL11 2 KA5 2 KN12 2 MV17 2 OV20 2 PA16 2 PC 2 PE10 2 SI +2 2 WB6 2 =cut __DATA__ BEGIN_FILE FORMAT_VERSION,2 AGENCY,HI1 FILENAME,090913.HI1 MORE OF YOUR DATA