jdawes has asked for the wisdom of the Perl Monks concerning the following question:

i have a file that has a number of records,
each with 5 fields of a fixed length.
how would i read each of those fields into a separate array..
like how do you tell the perl script to read
just a set number of characters and for it to skip
that number of chars for the next array?

so array a,b,c,d,e one for each field.

Each record is exactly 100 bytes long,
makes it easier to calculate.

The fields are as follows:
01-06 JOB NUMBER
07-16 CUSTOMER NAME
17-26 TELEPHONE NUMBER
27-36 AGENCY REFERENCE
37-100 STATUS DESCRIPTION (60 BYTES + 4 BYTES SPACES)

quite happy to have any ideas on this emailed to me at

Jeremy

Replies are listed 'Best First'.
Re: extract part of line
by Corion (Patriarch) on Jul 14, 2002 at 11:59 UTC

    It's not customary here to email solutions out, as then you'd be the only one benefitting of the solution and there would be no possibility of my peers correcting my solution. So I hope you come back and see the answer here.

    Whenever you have data in a fixed format, pack and unpack are the two functions you should be thinking about. After a short study of these manpages, you should be able to extract the data of your set. To split your data file up into the several sets, I will use a regular expression that matches 100 characters a piece. I could also unpack 100 bytes and then cut them out via substr. .

    So here's my (tested) attempt :

    use strict; # Slurp the whole data file in memory binmode DATA; my $data = do { undef $/; <DATA>; }; # And now split it into records : my @rows; while ($data =~ /\G(.{100})/gsm) { push @rows, $1; }; foreach my $row (@rows) { my @contents = unpack( 'a6a10a10a10a63', $row ); print join ":", @contents,"\n"; }; __DATA__ 1JJJJJNNNNNNNNNTTTTTTTTTTRRRRRRRRRRSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS +SSSSSSSSSSSSSSSSSSSSSSSSSSSSSS2JJJJJNNNNNNNNNTTTTTTTTTTRRRRRRRRRRSSSS +SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS3JJJJJNN +NNNNNNNTTTTTTTTTTRRRRRRRRRRSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS +SSSSSSSSSSSSSSSSSSSSSSS
    perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
Re: extract part of line
by bjelli (Pilgrim) on Jul 14, 2002 at 11:58 UTC
    Use substr:
    substr - get or alter a portion of a string 
    
    SYNOPSIS:
    =========
    substr EXPR,OFFSET,LEN,REPLACEMENT 
    substr EXPR,OFFSET,LEN 
    substr EXPR,OFFSET 
    
    DESCRIPTION:
    ============
    Extracts a substring out of EXPR and returns it. 

    There's a nice article about it at developer daily. If you want to learn more about handling fixed with data I recommend David Cross's book Data Munging with Perl

    --
    Brigitte    'I never met a chocolate I didnt like'    Jellinek
    http://www.horus.com/~bjelli/         http://perlwelt.horus.at
Re: extract part of line
by broquaint (Abbot) on Jul 14, 2002 at 12:01 UTC
    Something like[1] this should work for the file format that you've described
    { open(my $fh, "somfile.xyz") or die("ack - $!"); local $/ = \100; my @info; push @info, unpack('C6C10C10C10C64', $_) while <$fh>; }
    This reads 100 bytes of the file at a time then splits each chunk on the described byte boundaries. See. the unpack(), pack() and perlvar() manpages for more details.
    HTH

    _________
    broquaint

    [1] this code is untested but the theory behind it should follow

Re: extract part of line
by RMGir (Prior) on Jul 14, 2002 at 12:03 UTC
    You could use unpack for this, but I'm always leery of it, since one badly formatted or partial record will abort your script.

    Using a regex, this is pretty straightforward:

    # you can name your arrays a,b,c,d,e if you want to, # but I won't my (@jobNumbers, @customerNames, @telephoneNumbers, @agencyReferences, @statusDescriptions); # assuming you're reading from STDIN, or # files named on the command line... while(<>) { if(!/ (.{6}) # job number, $1 (.{10}) # customer name, $2 (.{10}) # telephone number, $3 (.{10}) # agency reference, $4 (.{64}) # status description, $5, adjust to skip spaces /x) # allow comments and whitespace in regex { warn "Skipping badly formatted record '$_'"; next; } push @jobNumbers, $1; push @customerNames, $2; push @telephoneNumber, $3; push @agencyReferences, $4; push @statusDescriptions, $5; }
    This way, you can change each "subsection" of the regex to do more validation on your input.

    Another alternative, which would (probably) be much faster, would be to use substr.

    # same arrays as above while(<>) { if(length<101) { # 1 more for newline chomp; warn "Record '$_' too short, skipping"; next; } push @jobNumbers, substr $_,0,6; push @customerNames, substr $_,6,10; push @telephoneNumbers, substr $_,16,10; push @agencyReferences, substr $_,26,10; # adjust either start or end if you need to skip spaces push @statusDescriptions, substr $_,36,64; }
    Hmmm, wait a sec, this feels like homework... Oh, the heck with it, I've already done all this typing :)
    --
    Mike

    Edit: Hmmm, why'd I think unpack will crash on bad input? The docs don't support that theory... Maybe that was old old old behaviour?

      I don't really don't know if anyone will
      have time to help out with this, sorry if im asking a bit much.
      I'm really new to all this, ie a struggler/newbie.

      The file im looking up is SERVER.RND
      it has the structure i mentioned before.
      What i have done so far, its pretty much a combination of a few different scripts, and i just have no idea..
      If someone has the time and would like to help me with
      this, if you email me i can send you a copy of what ive done so far.
      Or its in a zip file
      progress files
Re: extract part of line
by valdez (Monsignor) on Jul 14, 2002 at 13:19 UTC

    CPAN has many solutions for your problem, search for fixed

    Ciao, Valerio

Re: extract part of line
by particle (Vicar) on Jul 14, 2002 at 14:49 UTC
    i suggest you use a hash of records instead of multiple arrays. my solution is below. you'll have to modify it to use a file instead of the DATA handle.

    #!/usr/bin/perl require 5.006; use strict; use warnings; my %dataset; my @fields = ( 'Customer Name', 'Telephone Number', 'Agency Reference', 'Status Description', ); { local $/ = \100; while( <DATA> ) { my @record; push @record, unpack('a6a10a10a10a64', $_); @{ $dataset{$record[0]} }{ @fields } = @record[1..$#record]; } } use Data::Dumper; print Dumper \%dataset; # print just the Telephone Numbers, sorted by record (job number) print $dataset{$_}{'Telephone Number'},$/ for sort keys %dataset; __DATA__ 1JJJJJ1NNNNNNNNN1TTTTTTTTT1RRRRRRRRR1SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS +SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS2JJJJJ2NNNNNNNNN2TTTTTTTTT2RRRRRRRRR2S +SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS3JJJJJ +3NNNNNNNNN3TTTTTTTTT3RRRRRRRRR3SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS +SSSSSSSSSSSSSSSSSSSSSSSSSS
    which produces:

    > perl t-fixedwidth.pl $VAR1 = { '1JJJJJ' => { 'Status Description' => '1SSSSSSSSSSSSSSSSSSSS +SSSSSSSSSS SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS', 'Agency Reference' => '1RRRRRRRRR', 'Customer Name' => '1NNNNNNNNN', 'Telephone Number' => '1TTTTTTTTT' }, '3JJJJJ' => { 'Status Description' => '3SSSSSSSSSSSSSSSSSSSS +SSSSSSSSSS SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS', 'Agency Reference' => '3RRRRRRRRR', 'Customer Name' => '3NNNNNNNNN', 'Telephone Number' => '3TTTTTTTTT' }, '2JJJJJ' => { 'Status Description' => '2SSSSSSSSSSSSSSSSSSSS +SSSSSSSSSS SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS', 'Agency Reference' => '2RRRRRRRRR', 'Customer Name' => '2NNNNNNNNN', 'Telephone Number' => '2TTTTTTTTT' } }; 1TTTTTTTTT 2TTTTTTTTT 3TTTTTTTTT >

    ~Particle *accelerates*

      Thank you all,
      I really appreciate this,
      I'm working on integrating this in the code now
      Perl monks rox.