MKevin has asked for the wisdom of the Perl Monks concerning the following question:

I know I come often for parcing data help, but please help me. I know I have come a long way and learned a lot. So here is the data I want to parce
15550 08/27/1900 M=20 1 SNBR= 371 NOT NAMED XING=1 SSS=4 15555 08/27*150 421 35 0*152 434 35 0*153 447 35 0*154 456 + 35 0* 15560 08/28*156 466 35 0*158 479 35 0*160 491 35 0*161 503 + 35 0* 15565 08/29*163 514 35 0*164 524 35 0*165 537 40 0*166 551 + 40 0* . . . 15645 09/14E514 462 55 0E521 430 50 0E530 400 45 0E541 372 + 45 0* 15650 09/15E553 346 45 0E567 322 40 0E582 300 40 0E600 280 + 35 0* 15655 HRCTX4
This is how I want it to look like
08/27 80 NONAME01-1900 1 4 15550 08/27 0000 150 421 35 0 T 08/27 0600 152 434 35 0 T . . . 09/15 1800 600 280 35 0 E
This is my current code, however it is not working, help me plz:

#!/usr/bin/perl -w my $pkgdoc = <<'EOD'; #/**------------------------------------------------------------------ +- # @ file TC-parcer.pl # This script parses the fetched TC data from Dr. landsea's TC # reclassification project. # # date: June 25, 2008 #--------------------------------------------------------------------* +/ EOD # ## Pull arguments from the command line # use strict; use warnings; use Getopt::Long; if (@ARGV <1) { # if arguments is smaller than 1 then print the + pacakage document print $pkgdoc; exit -1; } my $txtfile = shift; # pulling arguments # ## Open text file with Dr. Landsea's data if not stop the program # open (DATA, $txtfile)||die "cannot open $txtfile for reading"; # ## Gather data from the title line # my $format = "a5 x a x a5 x a x a4 x a4 x a x a3 x a9 x a x a13 x a5 x + a x a4 x a"; my ($CardNo, $MMDD, $YY, $StormNo, $TotalNo, $Name, $XING, $SSS) = ($1 +, $3, $5, $7, $8, $10, $11, $13, $15); while (<DATA>) { if (/^XING/) { ($CardNo, $MMDD, $YY, $StormNo, $TotalNo, $Name, $XING, $SSS +)= unpack ($format, $_); my $TotalNo = $TotalNo * 4; if ($Name && $Name =~ /NOT NAMED/){$Name = "NONAME";} } last; print OUT "$MMDD $TotalNo $Name$StormNo-$YY $XING $SSS $CardNo\ +n"; } # ## Gather important data of the center of each storm # my $format1 = "a5 x a x a5 x a x a3 x a4 x a x a3 x a x a4 x a x a3 x +a4 x a x a3 x a x a4 x a x a3 x a4 x a x a3 x a x a4 x a x a3 x a4 x +a x a3 x a x a4 x a1"; my (@MMDDdata, @Stages1, @LAT1, @LONG1, @MaxWind1, @MinP1, @Stages2, @ +LAT2, @LONG2, @MaxWind2, @MinP2, @Stages3, @LAT3, @LONG3, @MaxWind3, +@MinP3, @Stages4, @LAT4, @LONG4, @MaxWind4, @MinP4) = ($3, $11, $5, $ +6, $8, $10, $18, $12, $13, $15, $17, $24, $19, $20, $22, $24, $32, $2 +6, $27, $29, $31); while (<DATA>) { my (@MMDDdata, @Stages1, @LAT1, @LONG1, @MaxWind1, @MinP1, @Sta +ges2, @LAT2, @LONG2, @MaxWind2, @MinP2, @Stages3, @LAT3, @LONG3, @Max +Wind3, @MinP3, @Stages4, @LAT4, @LONG4, @MaxWind4, @MinP4) = unpack ( +$format1, $_); last if (!defined($LAT2)); print OUT "@MMDDdata 0000 @LAT1 @LONG1 @MaxWind1 @MinP1 @Stages +1\n @MMDDdata 0600 @LAT2 @LONG2 @MaxWind2 @MinP2 @Stages2\n @MMDDdata + 1200 @LAT3 @LONG3 @MaxWind3 @MinP3 @Stages3\n @MMDDdata 1800 @LAT4 @ +LONG4 @MaxWind4 @MinP4 @Stages4\n"; } close (DATA); close (OUT);

Replies are listed 'Best First'.
Re: Parser help
by pc88mxer (Vicar) on Jun 26, 2008 at 14:50 UTC
    In your second while look, don't use array variables (@MMDDdata, @Stages1, etc.) Use scalars (same names but prefixed with a $):
    while (<DATA>) { my ($MMDDdata, $Stages1, $LAT1, ...) = unpack($ormat1, $_); last if (!defined($LAT2)); print OUT "$MMDDdata 0000 $LAT1 $LONG1 ...\n"; }
      i still get this error x outside of string at TC-parcer.pl line 58, <DATA> line 2.
        unpack is for parsing binary files. For example the 'x' in your $format doesn't parse a space but a real byte 0x00 nowhere to be found in your example data. I don't say it is not possible to use unpack to parse text files, but it is like using a soldering iron to repair your wrist watch.

        You seem to have some other misconceptions about unpack. Your line 40 makes no sense. The assignment of ($1,$3,$5,$7... to the variables will not influence the assignement from unpack later to only pick the first, third and so on of the parsed items

        Use regexes instead. For example your title line can be parsed by

        ($CardNo, $MMDD, $YY, $StormNo, $TotalNo, $Name, $XING, $SSS)= m{^(\d+)\s+ (\d\d/\d\d) /(\d+)\s+ #the date M=(\d+)\s+ (\d+)\s+ SNBR=\s*(\d+)\s+ (.+) \s* # arbitrary name or NO NAME XING=(\d+)\s+ SSS=(d+) \s* $}x;
        (I assumed here that instead of NO NAME any arbitrary string could be in that place)

        Note that I use \s+ to parse a space. That makes the parsing more robust, because it doesn't matter if there is a tab character instead of a space or more than one space. Also I use \s* in places where spaces are optional

        m{} is the same as //, it is just nicer if you have a '/' to parse, you don't need to escape it. The x lets me use spaces and comments in the regex

        I kept the regex simple, you should be able to construct the regex for the other lines from this. Just read some more about regexes in a good perl book or online documentation.