comment on

To me, nesting a while (<FH>) loop inside another one never seems like the right approach. The condition that should cause you to exit the inner loop happens to be one that is supposed to be picked up in the outer loop, so that you can get into the inner loop again to handle the next record. That makes things too complicated.

I'll suggest an alternative, but first I'd like to point out that the input data appears to consist of fixed-length records. Having looked at the cited page, there seem to be three basic types of data lines -- one has digits in columns 1-5, the other two don't; among the latter, there are a few that are "category" headings (e.g. "SYSTEMS ENGINEERING", "COMPUTER SCIENCE", etc), and the rest are "detail" records about a given course/section. (Actually the latter type probably breaks down into two or three sub-types, presenting different sorts of information.

Fixed-width data can be handled either with regex matching (using things like / (.{5}) (.{4}) (.{4})/), or with unpack. The latter is really simpler (even though it seems more complicated when you look it up in the "perlfunc" man page). It would go something like this, in your case:

my %courses;
my $mnemonic;  # this is the correct spelling :)

while (<COD>) {  # let's use $_, shall we?
   next unless ( /\S/ );  # skip blank lines;
   my @fields;

   my ($id, $rest) = unpack("xA5xA*", $_);  # break line into 2 pieces

   if ( $id =~ /^\d{5}$/ ) { # it's the start of a record
      ($mnemonic,@fields = unpack("A4xA4xA4xA2xA2xA28A*", $rest);
      # work out what to do with @fields; $mnemonic will retain
      # it's current value till the next one is encountered,
      # so sub-records after this one will be added to the
      # correct hash element.
   }
   elsif ( $rest =~ /^\d+-\d+/ ) { # it's a sub-record
      my ($time,$days,$bldg,$room,$end) = unpack("A9xA6xA4xA4xA*", $re
+st);
      # you need to work out what to do with $end,
      # and push stuff into the current $courses{$mnemonic} structure
   }
   else {
      # do something else with (or ignore) other stuff
   }
}
[download]

I hope that will get you started. Note that by using "unpack", the "DAYS" portion of the sub-records will always be taken as a string of six characters, some of which happen to be spaces ("M W F " vs. " T R " etc) -- you could get the same result with a suitable regex instead of unpack, but plain-old split will do it wrong. Personally, I think this is one situation where unpack is relatively easier to do than a regex; it's just a natural for fixed-length ASCII records.

In reply to Re: Parsing COD text help by graff
in thread Parsing COD text help by dimmesdale

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.