extract part of line

jdawes has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: extract part of line by Corion (Patriarch) on Jul 14, 2002 at 11:59 UTC
It's not customary here to email solutions out, as then you'd be the only one benefitting of the solution and there would be no possibility of my peers correcting my solution. So I hope you come back and see the answer here. Whenever you have data in a fixed format, pack and unpack are the two functions you should be thinking about. After a short study of these manpages, you should be able to extract the data of your set. To split your data file up into the several sets, I will use a regular expression that matches 100 characters a piece. I could also `unpack` 100 bytes and then cut them out via substr. . So here's my (tested) attempt : use strict; # Slurp the whole data file in memory binmode DATA; my $data = do { undef $/; <DATA>; }; # And now split it into records : my @rows; while ($data =~ /\G(.{100})/gsm) { push @rows, $1; }; foreach my $row (@rows) { my @contents = unpack( 'a6a10a10a10a63', $row ); print join ":", @contents,"\n"; }; __DATA__ 1JJJJJNNNNNNNNNTTTTTTTTTTRRRRRRRRRRSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS +SSSSSSSSSSSSSSSSSSSSSSSSSSSSSS2JJJJJNNNNNNNNNTTTTTTTTTTRRRRRRRRRRSSSS +SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS3JJJJJNN +NNNNNNNTTTTTTTTTTRRRRRRRRRRSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS +SSSSSSSSSSSSSSSSSSSSSSS [download] `perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web` [download]	[reply] [d/l] [select]
Re: extract part of line by bjelli (Pilgrim) on Jul 14, 2002 at 11:58 UTC
Use substr: substr - get or alter a portion of a string SYNOPSIS: ========= substr EXPR,OFFSET,LEN,REPLACEMENT substr EXPR,OFFSET,LEN substr EXPR,OFFSET DESCRIPTION: ============ Extracts a substring out of EXPR and returns it. There's a nice article about it at developer daily. If you want to learn more about handling fixed with data I recommend David Cross's book Data Munging with Perl -- Brigitte 'I never met a chocolate I didnt like' Jellinek http://www.horus.com/~bjelli/ http://perlwelt.horus.at	[reply]
Re: extract part of line by broquaint (Abbot) on Jul 14, 2002 at 12:01 UTC
Something like^[1] this should work for the file format that you've described `{ open(my $fh, "somfile.xyz") or die("ack - $!"); local $/ = \100; my @info; push @info, unpack('C6C10C10C10C64', $_) while <$fh>; }` [download] This reads 100 bytes of the file at a time then splits each chunk on the described byte boundaries. See. the `unpack()`, `pack()` and `perlvar()` manpages for more details. HTH `_________ broquaint` ^[1] this code is untested but the theory behind it should follow	[reply] [d/l]
Re: extract part of line by RMGir (Prior) on Jul 14, 2002 at 12:03 UTC
You could use unpack for this, but I'm always leery of it~~, since one badly formatted or partial record will abort your script~~. Using a regex, this is pretty straightforward: # you can name your arrays a,b,c,d,e if you want to, # but I won't my (@jobNumbers, @customerNames, @telephoneNumbers, @agencyReferences, @statusDescriptions); # assuming you're reading from STDIN, or # files named on the command line... while(<>) { if(!/ (.{6}) # job number, $1 (.{10}) # customer name, $2 (.{10}) # telephone number, $3 (.{10}) # agency reference, $4 (.{64}) # status description, $5, adjust to skip spaces /x) # allow comments and whitespace in regex { warn "Skipping badly formatted record '$_'"; next; } push @jobNumbers, $1; push @customerNames, $2; push @telephoneNumber, $3; push @agencyReferences, $4; push @statusDescriptions, $5; } [download] This way, you can change each "subsection" of the regex to do more validation on your input. Another alternative, which would (probably) be much faster, would be to use substr. `# same arrays as above while(<>) { if(length<101) { # 1 more for newline chomp; warn "Record '$_' too short, skipping"; next; } push @jobNumbers, substr $_,0,6; push @customerNames, substr $_,6,10; push @telephoneNumbers, substr $_,16,10; push @agencyReferences, substr $_,26,10; # adjust either start or end if you need to skip spaces push @statusDescriptions, substr $_,36,64; }` [download] Hmmm, wait a sec, this feels like homework... Oh, the heck with it, I've already done all this typing :) -- Mike Edit: Hmmm, why'd I think unpack will crash on bad input? The docs don't support that theory... Maybe that was old old old behaviour?	[reply] [d/l] [select]
Re: Re: extract part of line by jdawes (Novice) on Jul 15, 2002 at 12:21 UTC
I don't really don't know if anyone will have time to help out with this, sorry if im asking a bit much. I'm really new to all this, ie a struggler/newbie. The file im looking up is SERVER.RND it has the structure i mentioned before. What i have done so far, its pretty much a combination of a few different scripts, and i just have no idea.. If someone has the time and would like to help me with this, if you email me i can send you a copy of what ive done so far. Or its in a zip file progress files	[reply]
Re: extract part of line by valdez (Monsignor) on Jul 14, 2002 at 13:19 UTC
CPAN has many solutions for your problem, search for fixed Ciao, Valerio	[reply]
Re: extract part of line by particle (Vicar) on Jul 14, 2002 at 14:49 UTC
i suggest you use a hash of records instead of multiple arrays. my solution is below. you'll have to modify it to use a file instead of the DATA handle. #!/usr/bin/perl require 5.006; use strict; use warnings; my %dataset; my @fields = ( 'Customer Name', 'Telephone Number', 'Agency Reference', 'Status Description', ); { local $/ = \100; while( <DATA> ) { my @record; push @record, unpack('a6a10a10a10a64', $_); @{ $dataset{$record[0]} }{ @fields } = @record[1..$#record]; } } use Data::Dumper; print Dumper \%dataset; # print just the Telephone Numbers, sorted by record (job number) print $dataset{$_}{'Telephone Number'},$/ for sort keys %dataset; __DATA__ 1JJJJJ1NNNNNNNNN1TTTTTTTTT1RRRRRRRRR1SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS +SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS2JJJJJ2NNNNNNNNN2TTTTTTTTT2RRRRRRRRR2S +SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS3JJJJJ +3NNNNNNNNN3TTTTTTTTT3RRRRRRRRR3SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS +SSSSSSSSSSSSSSSSSSSSSSSSSS [download] which produces: > perl t-fixedwidth.pl $VAR1 = { '1JJJJJ' => { 'Status Description' => '1SSSSSSSSSSSSSSSSSSSS +SSSSSSSSSS SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS', 'Agency Reference' => '1RRRRRRRRR', 'Customer Name' => '1NNNNNNNNN', 'Telephone Number' => '1TTTTTTTTT' }, '3JJJJJ' => { 'Status Description' => '3SSSSSSSSSSSSSSSSSSSS +SSSSSSSSSS SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS', 'Agency Reference' => '3RRRRRRRRR', 'Customer Name' => '3NNNNNNNNN', 'Telephone Number' => '3TTTTTTTTT' }, '2JJJJJ' => { 'Status Description' => '2SSSSSSSSSSSSSSSSSSSS +SSSSSSSSSS SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS', 'Agency Reference' => '2RRRRRRRRR', 'Customer Name' => '2NNNNNNNNN', 'Telephone Number' => '2TTTTTTTTT' } }; 1TTTTTTTTT 2TTTTTTTTT 3TTTTTTTTT > [download] ~Particle accelerates	[reply] [d/l] [select]
Re: Re: extract part of line by jdawes (Novice) on Jul 15, 2002 at 06:46 UTC
Thank you all, I really appreciate this, I'm working on integrating this in the code now Perl monks rox.	[reply]