MKevin has asked for the wisdom of the Perl Monks concerning the following question:

I am new to Regex and keep getting this error... First the code:
#!/usr/bin/perl -w my $pkgdoc = <<'EOD'; #/**------------------------------------------------------------------ +- # @ file TC-parcer.pl # This script parses the fetched TC data from Dr. landsea's TC # reclassification project. # # @since 03/18/2008 # @usage TC-parcer.pl txtfile.txt # date: June 25, 2008 #--------------------------------------------------------------------* +/ EOD # ## Pull arguments from the command line # use strict; use warnings; use Getopt::Long; if (@ARGV <1) { # if arg < 1 then pr +int the pckdoc print $pkgdoc; exit -1; } my $txtfile = shift; # pulling arguments # ## Open text file with Dr. Landsea's data if not stop the program # open (DATA, $txtfile)||die "cannot open $txtfile for reading"; # ## Gather data from the title line which looks like: # ### 00065 08/16/1851 M=12 4 SNBR= 4 NOT NAMED XING=1 SSS=3 # while (<DATA>) { my ($CardNo, $MMDD, $YY, $TotalNo, $SNRB, $StormNo, $Name, $XING, $S +SS)= m{^(\d+)\s+ (\d\d/\d\d) /(\d+)\s+ M=(\d+)\s+ (\d+)\s+ SNBR=\s*(\ +d+)\s+ (.+) \s* XING=(\d+)\s+ SSS=(d+) \s* $}x; $Name =~ s/NOT NAMED/NONAME/; # replacing NOT NAME +D with NONAME $TotalNo = $TotalNo * 4; print "$MMDD $TotalNo $Name$StormNo-$YY $XING $SSS $CardNo\n"; last if $SSS !~ /[0-5]/; # skip to data lines } # ## Gather important data of the center of each storm # ### 00070 08/16*134 480 40 0*137 495 40 0*140 510 50 + 0*144 528 50 0* # while (<DATA>) { my ($CardNo, $MMDD, $Type, $LAT1, $LON1, $Vmax1, $Pmin1, $TET1, $LAT +2, $LON2, $Vmax2, $Pmin2, $TET2, $LAT3, $LON3, $Vmax3, $Pmin3, $TET3, + $LAT4, $LON4, $Vmax4, $Pmin4, $TET4) = m{^(\d+)\s+ (\d\d/\d\d)\s+ (. ++) (\d\d\d)\s+ (\d\d\d)\s+ (\d+)\s+ (\d+)\s+ (.+) (\d\d\d)\s+ (\d\d\d +)\s+ (\d+)\s+ (\d+)\s+ (.+) (\d\d\d)\s+ (\d\d\d)\s+ (\d+)\s+ (\d+)\s+ + (.+) (\d\d\d)\s+ (\d\d\d)\s+ (\d+)\s+ (\d+)\s+ (.+) \s* $}x; if ($TET1 = '*') {$TET1 = 'T';} if ($TET2 = '*') { $TET2 = 'T';} if ($TET3 = '*') { $TET3 = 'T';} if ($TET4 = '*') { $TET4 = 'T';} print "$MMDD 0000 $LAT1 $LON1 $Vmax1 $Pmin1 $TET1\n $MMDD 0600 $LAT2 + $LON2 $Vmax2 $Pmin2 $TET2\n $MMDD 1200 $LAT3 $LON3 $Vmax3 $Pmin3 $TE +T3\n $MMDD 1800 $LAT4 $LON4 $Vmax4 $Pmin4 $TET4\n"; last if $TET4 !~ m/T|E$/; # end this storm's t +rack } close (DATA);
My error message shows:
Use of uninitialized value in concatenation (.) or string at TCparcer. +pl line 78, <DATA> line 14.
and so on ... The data in question to parse is:
00065 08/16/1851 M=12 4 SNBR= 4 NOT NAMED XING=1 SSS=3 00070 08/16*134 480 40 0*137 495 40 0*140 510 50 0*144 528 + 50 0* 00075 08/17*149 546 60 0*154 565 60 0*159 585 70 0*161 604 + 70 0* 00080 08/18*166 625 80 0*169 641 80 0*172 660 90 0*176 676 + 90 0* 00085 08/19*180 693 90 0*184 711 70 0*189 726 60 0*194 743 + 60 0* 00090 08/20*199 759 70 0*205 776 70 0*212 790 70 0*219 804 + 70 0* 00095 08/21*226 814 60 0*232 825 60 0*239 836 70 0*244 843 + 70 0* 00100 08/22*250 849 80 0*256 855 80 0*262 860 90 0*268 863 + 90 0* 00105 08/23*274 865 100 0*280 866 100 0*285 866 100 0*296 861 + 100 0* 00110 08/24*307 851 90 0*316 841 70 0*325 830 60 0*334 814 + 50 0* 00115 08/25*340 800 40 0*348 786 40 0*358 770 40 0*368 751 + 40 0* 00120 08/26*378 736 40 0*389 718 40 0*400 700 40 0*413 668 + 40 0* 00125 08/27*428 633 40 0*445 602 40 0*464 572 40 0*485 542 + 40 0* 00130 HRAFL3IGA1
Plus I would like to run this program on a text file that contains more than one storm on it, in the similar format as above (just more of them continuously on one file). So I was thinking of doing a while loop, but what will be the basis for it to return true in order to continue to redo the title parcing and the data parcing. Not to mention skip the last line which could start with any 5 numbers and any letters (last if?)

Replies are listed 'Best First'.
Re: New to Regex and need help
by pc88mxer (Vicar) on Jun 27, 2008 at 14:29 UTC
    You should always test to see if your match operations succeed, i.e.:
    if ( ($CardNo, $MMDD, $Type, ... ) = m{^...} ) { # do stuff } else { warn "unable to parse this line: $_"; }
    In your case, data line 14 (from what I can tell) is the short line at the end of the file. It's probably not matching your regular expression, and so lots of variables (like $TET4) are being set to undef.
Re: New to Regex and need help
by johngg (Canon) on Jun 27, 2008 at 14:39 UTC
    I have had a quick glance at your regex and when looking for $MMDD you do (\d\d/\d\d)\s+ which says you want at least one space after the date. However, all the dates are followed by an asterisk. Bear in mind that asterisks are regex meta-characters so you need to escape them to get literals. There may be other problems with your regex but that was the first show stopper I saw.

    Looking at the data you have posted, it's columnar layout means you might be better off considering unpack. Other points:-

    • Avoid using DATA as your handle name as Perl already provides you with a handle of that name for reading data contained in the script after __END__ or __DATA__ tags.

    • It is recommended that you use lexical filehandles and the three argument form of open.

    • It is good that you check for the success of your open but it can be useful to include $! (OS error) in the output, see perlvar.

    I hope this is useful.

    Cheers,

    JohnGG

Re: New to Regex and need help
by Crackers2 (Parson) on Jun 27, 2008 at 15:16 UTC
    Unrelated to the regexes, this:
    if ($TET1 = '*') {$TET1 = 'T';} if ($TET2 = '*') { $TET2 = 'T';} if ($TET3 = '*') { $TET3 = 'T';} if ($TET4 = '*') { $TET4 = 'T';}
    probably isn't doing what you want it to do. Try
    if ($TET1 eq '*') {$TET1 = 'T';} if ($TET2 eq '*') { $TET2 = 'T';} if ($TET3 eq '*') { $TET3 = 'T';} if ($TET4 eq '*') { $TET4 = 'T';}
    instead. A single = is an assignment, not a comparison. For a numerical comparison you should use ==, and for string comparison, as above, it's eq
Re: New to Regex and need help
by Ciclamino (Sexton) on Jun 27, 2008 at 15:16 UTC
    I'm not sure the regex that you use to match a data line is excatly right, so that may be the problem. To answer the second part, and probably make the overall program easier to debug, I'd do it was if's inside one while loop. I'd also break up the regexps into multiple lines. So you'd have something like this:
    if (/^(\d+)\s+ # $1 = CardNo (\d\d\/\d\d)\/ # $2 = MMDD (\d+)\s+ # $3 = YY M=(\d+)\s+(\d+)\s+ # $4 = TotalNo $5 = StormNo SNBR=\s*(\d+)\s+ # $6 = SNBR (.+)\s+ # $7 = Name XING=(\d+)\s+ # $8 = XING SSS=(\d+) # $9 = SSS /x) { # matched the header line my ($CardNo, $MMDD, $YY, $TotalNo, $StormNo, $SNBR, $Name, $XI +NG, $SSS) = ($1, $2, $3, $4, $5, $6, $7, $8, + $9); $Name =~ s/NOT NAMED\s+/NONAME/; # replace and strip trailing +spaces print "$MMDD $TotalNo $Name$StormNo-$YY $XING $SSS $CardNo\n"; next; } if (# regexp for data line.....) ........ }
Re: New to Regex and need help
by Grey Fox (Chaplain) on Jun 27, 2008 at 14:55 UTC
    I have found http://www.regular-expressions.info/quickstart.html very helpful working with regex's. Lots of good examples.
    Also whats happening when you get an uninitialized value error is that part of your regular expression is not working and returning blank results within the (). Also I found tools like http://www.txt2re.com/ or Eclipses regex tool to be helpful in debugging my regex commands.
    -- Grey Fox
    "We are grey. We stand between the darkness and the light" B5
Re: New to Regex and need help
by Anonymous Monk on Jun 27, 2008 at 15:16 UTC

      No need to pay for something like that when you can get it for free right here at the Monastery. (Yep, I'm pimpin' my own warez. :)

      
      -- 
      Human history becomes more and more a race between education and catastrophe. -- HG Wells