You definitely should reformat this post. It is not necessary to show some much data, just a couple of records would have been enough. Also, I was not able to ascertain exactly what you wanted in the way of output. Explaining things like what the 10 in "OV20 10" is supposed to count or which of the many date/times in the data that you wanted would be helpful.

When parsing any data, the first step is to think about the format and what separates the data. Here the format is space separated tokens. Each token has some identifier followed by an optional comma and then comma separated values.

Examples: UNITS,PPM TZONE,HST,10 BEGIN_FILE DH1,150031001,9,8,5,6,5,5,8,9,8,7,4,-999,5
The data appears to be very regular and that makes it easy to parse. Don't over complicate things. The first step tokenizer should just split each line into tokens based upon whitespace. Each token can then be split on ",". No fancy regex stuff appears to be required here. Use the easiest tool to get the job done.

When you see a new measurement variable like CO or NO2, just keep track of that change and print the data if any. It appears that you are counting number of 24 hour measurement days from reporting stations for particular types of measurements, in particular PM 2.5 whatever that means. I don't see any need to pay attention to the start or end of data flags as all that appears to be necessary is to pay attention to the tokens with lots of comma's in them - so just count commas!

So this does that. I just cut-n-pasted your data into a __DATA__ segment to run my code then chopped most of it off for posting here. Again, I have no idea what date you want - this data has lots and lots of dates and times! My count of stns reporting doesn't agree with your output line, but that is probably because there are extra conditions that you didn't explain.

#!/usr/bin/perl -w use strict; my $data = <DATA>; my @data = split(/\s+/,$data); #print "$_\n" foreach @data; #run to see what data looks like my %stns; my $variable = undef; foreach my $token (@data) { my @tokens = split(/,/,$token); if ($tokens[0] eq 'VARIABLE') { print_line(); $variable = $tokens[1]; } if ( @tokens > 15) #stations with 24 hour data { $stns{$tokens[0]}++; } } print_line(); #for the last data set sub print_line { return if (!defined($variable)) ; #no data yet return if (!keys %stns); #no 24 point data print "$variable DATE? "; print "$_ $stns{$_} " foreach (sort keys %stns); print "\n"; %stns = (); } =prints CO DATE? DH1 2 KA5 2 NO2 DATE? KA5 2 WB6 2 OZONE DATE? SI2 2 PM10 DATE? DH1 2 KA5 2 PC 2 WB6 2 PM2.5 DATE? DH1 2 HL11 2 KA5 2 KH19 2 KN12 2 MV17 2 OV20 2 PA16 2 PC 2 + SI2 2 SO2 DATE? DH1 2 HL11 2 KA5 2 KN12 2 MV17 2 OV20 2 PA16 2 PE10 2 WB6 2 WD DATE? DH1 2 HL11 2 KA5 2 KN12 2 MV17 2 OV20 2 PA16 2 PC 2 PE10 2 SI +2 2 WB6 2 WS DATE? DH1 2 HL11 2 KA5 2 KN12 2 MV17 2 OV20 2 PA16 2 PC 2 PE10 2 SI +2 2 WB6 2 =cut __DATA__ BEGIN_FILE FORMAT_VERSION,2 AGENCY,HI1 FILENAME,090913.HI1 MORE OF YOUR DATA

In reply to Re: Data Parsing help for newbie HELP ME!! by Marshall
in thread Data Parsing help for newbie HELP ME!! by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.