LordAvatar has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I am writing an application which parses information
from National Weather Service warnings.
The top 3 lines of each warning message appear as follows:

WUUS52 KGSP 011455
SVRGSP
NCC045-071-109-SCC021-087-091-011545-
The third line contains a series of state/county codes. If the warning is issued for multiple counties in the same state only the first county in the state has the state abbreviation appended to it. In the above example counties 045, 071 and 109 in North Carolina are under a severe thunderstorm warning. Multiple states and counties are handled as in the example above. The last 6 digits are the expiration time in DDHHMM format.

I am trying to parse the third line and match the intermediate county codes to their parent state i.e 071 and 109 to NC.

Here is a snippet of code that I've written which reads the first and third lines. I've tried a few regexes for the third line but I haven't found a way to reliably add the state abbreviation to the intermediate codes. Right now I just get a list of all the codes like this:

NCC045 071 109 SCC021 087..etc.
I am trying to create:
NCC045 NCC071 NCC109 SCC021...etc.

Thanks for any help!

LINE: while(defined($line=<F>)) {
           chomp $line;
           $count++;
           #first line
           if ($line=~/\w{4}\d{2}\s+\w{4}\s+(\d{6})/) {
                      $issueTime = $1;
                      push @warningInfo, $issueTime;
              }
              #third line
              if ($count==3) {
                  
                      push @warningInfo, split/-/,$line
                      $count=0;
                      next FILE;
              }
              else {next;}
      }
      close F;

Replies are listed 'Best First'.
Re: A string parsing question
by suaveant (Parson) on Apr 20, 2001 at 19:12 UTC
    Not too hard...
    # I just put it in $_ for example's sake. $_ = 'NCC045-071-109-SCC021-087-091-011545-'; @foo = split '-', $_; until($exp = pop @foo) {}; my $code; for(@foo) { if(s/^([A-Z]+)//) { #if there are letters at the front $code = $1; } push @{$data{$code}}, $_; # or do print $string_with_code = "$code$_\n"; }
    Tested, it works... expiration is in $exp (assuming it is ALWAYS the last item) For loop looks at each item, strips of the code and stores it as the current code, then you can do as you like with $_, which contains the number.
                    - Ant
Re: A string parsing question
by Sifmole (Chaplain) on Apr 20, 2001 at 19:35 UTC
    I just realized I missed an aspect of your question... Here is a one that actually works. :)
    my $data = 'NCC045-071-109-SCC021-087-091-011545-'; my $state; while ($data =~ m/([A-Z]+)?(\d+)/g) { last if (length($2) == 6); $state = $1 unless (! defined $1); print $state, $2, ' '; }
    Finally that should be better.
Re: A string parsing question
by jeroenes (Priest) on Apr 20, 2001 at 19:13 UTC
    Try
    $line = s/([A..Z]{3})(\d+)-(\d+)-(\d+)-([A..Z])/$1$2 $1$3 $1$4 $5/g;

    "We are not alone"(FZ)
      Only problem with that is if there are 4 or 2 counties... which I assume can happen. It's not a great regexp problem... better codewise.
                      - Ant
Re: A string parsing question
by LordAvatar (Acolyte) on Apr 20, 2001 at 22:27 UTC

    Hello fellow Monks,

    Thanks for your help!
    -LordAvatar

Re: A string parsing question
by Sifmole (Chaplain) on Apr 20, 2001 at 19:26 UTC
    ** IGNORE **

    Or..

    my $data = 'NCC045-071-109-SCC021-087-091-011545-'; $data =~ s/([A-Z]+)//; my $state = $1; my @counties = split('-', $data); print join(' ', map { $state.$_; } @counties), "\n";
    Runs, and will handle any number of codes on the same line.
      Ahhh, but it only catches the first state code... the 4th item becomes NCCSCC021, does it not? And the expiry date has NCC prepended, too...
                      - Ant
        Thanks for pointing that out. I was never too good at those reading comprehension thingies.
      ** Ignore... this is wrong **

      Another alternative if you don't like the map and split.

      my $data = 'NCC045-071-109-SCC021-087-091-011545-'; $data =~ s/([A-Z]+)//; my $state = $1; print $state, $1, ' ' while ($data =~ m/(\d+)/g);