in reply to Parsing COD text help

To me, nesting a while (<FH>) loop inside another one never seems like the right approach. The condition that should cause you to exit the inner loop happens to be one that is supposed to be picked up in the outer loop, so that you can get into the inner loop again to handle the next record. That makes things too complicated.

I'll suggest an alternative, but first I'd like to point out that the input data appears to consist of fixed-length records. Having looked at the cited page, there seem to be three basic types of data lines -- one has digits in columns 1-5, the other two don't; among the latter, there are a few that are "category" headings (e.g. "SYSTEMS ENGINEERING", "COMPUTER SCIENCE", etc), and the rest are "detail" records about a given course/section. (Actually the latter type probably breaks down into two or three sub-types, presenting different sorts of information.

Fixed-width data can be handled either with regex matching (using things like / (.{5}) (.{4}) (.{4})/), or with unpack. The latter is really simpler (even though it seems more complicated when you look it up in the "perlfunc" man page). It would go something like this, in your case:

my %courses; my $mnemonic; # this is the correct spelling :) while (<COD>) { # let's use $_, shall we? next unless ( /\S/ ); # skip blank lines; my @fields; my ($id, $rest) = unpack("xA5xA*", $_); # break line into 2 pieces if ( $id =~ /^\d{5}$/ ) { # it's the start of a record ($mnemonic,@fields = unpack("A4xA4xA4xA2xA2xA28A*", $rest); # work out what to do with @fields; $mnemonic will retain # it's current value till the next one is encountered, # so sub-records after this one will be added to the # correct hash element. } elsif ( $rest =~ /^\d+-\d+/ ) { # it's a sub-record my ($time,$days,$bldg,$room,$end) = unpack("A9xA6xA4xA4xA*", $re +st); # you need to work out what to do with $end, # and push stuff into the current $courses{$mnemonic} structure } else { # do something else with (or ignore) other stuff } }
I hope that will get you started. Note that by using "unpack", the "DAYS" portion of the sub-records will always be taken as a string of six characters, some of which happen to be spaces ("M W F " vs. " T R " etc) -- you could get the same result with a suitable regex instead of unpack, but plain-old split will do it wrong. Personally, I think this is one situation where unpack is relatively easier to do than a regex; it's just a natural for fixed-length ASCII records.