You could use unpack for this, but I'm always leery of it,
since one badly formatted or partial record will abort your
script.
Using a regex, this is pretty straightforward:
# you can name your arrays a,b,c,d,e if you want to,
# but I won't
my (@jobNumbers, @customerNames, @telephoneNumbers,
@agencyReferences, @statusDescriptions);
# assuming you're reading from STDIN, or
# files named on the command line...
while(<>) {
if(!/
(.{6}) # job number, $1
(.{10}) # customer name, $2
(.{10}) # telephone number, $3
(.{10}) # agency reference, $4
(.{64}) # status description, $5, adjust to skip spaces
/x) # allow comments and whitespace in regex
{
warn "Skipping badly formatted record '$_'";
next;
}
push @jobNumbers, $1;
push @customerNames, $2;
push @telephoneNumber, $3;
push @agencyReferences, $4;
push @statusDescriptions, $5;
}
This way, you can change each "subsection" of the regex to do more validation on your input.
Another alternative, which would (probably) be much faster, would be to use substr.
# same arrays as above
while(<>) {
if(length<101) { # 1 more for newline
chomp;
warn "Record '$_' too short, skipping";
next;
}
push @jobNumbers, substr $_,0,6;
push @customerNames, substr $_,6,10;
push @telephoneNumbers, substr $_,16,10;
push @agencyReferences, substr $_,26,10;
# adjust either start or end if you need to skip spaces
push @statusDescriptions, substr $_,36,64;
}
Hmmm, wait a sec, this feels like homework... Oh, the heck with it, I've already done all this typing :)
--
Mike
Edit: Hmmm, why'd I think unpack will crash on bad input? The docs don't support that theory... Maybe that was old old old behaviour? |