Capturing columnar data from a text file

rhxk has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,

I have a text file that has about 50-60 columns showing some kind of a report separated by spaces. and I want to capture some of the variables w/o typing in the whole reg. expression.

my question is, if I concatenate it, can i still capture the i'th element? I tried it but $3* and $4* come up with empty strings....any suggestions?

if ($line =~ /^([A-Z]{3})-([A-Z])!?((\s+)(\S+)){27}/m) {
    $a = $1;
    $b = $2;
    $c = $4;
    $d = $5;
    $e = $13;
    $f = $33;
    $g = $35;
    $h = $49;
    #do stuff here;
}
[download]

thanx

2006-05-17 Retitled by planetscape, as per Monastery guidelines

( keep:2 edit:24 reap:0 )

Original title: 'calling reg exp gurus'

Comment on Capturing columnar data from a text file Download Code

Replies are listed 'Best First'.
Re: Capturing columnar data from a text file by ptum (Priest) on May 16, 2006 at 17:49 UTC
This is the kind of thing that seems to cry out for use of the split command, if your data is reliably separated by a unique string that doesn't appear inside the column values. Once you've got an array, that is when I would apply a (comparatively more expensive and detailed) regex to the array elements I cared about, if it was still necessary. No good deed goes unpunished. -- (attributed to) Oscar Wilde	[reply]
Re: Capturing columnar data from a text file by dsheroh (Monsignor) on May 16, 2006 at 18:13 UTC
If the values are separated by single spaces, then the previous suggestion to look at `split` is probably the way to go: `my @fields = split ' ', $line;` If the lines are fixed width, with each field starting at a specific position, then `split` may still work for you (you can `split` on a regex instead of a fixed character to avoid getting empty values wherever there are consecutive spaces), but you might also want to look at using `unpack` instead.	[reply] [d/l] [select]
Re: Capturing columnar data from a text file by ruzam (Curate) on May 16, 2006 at 21:26 UTC
If the data is at fixed columns, then the problem cries out for the use of unpack. `my @cols = unpack("A5 (x2 A10)*", $line); # $cols[0] gets first 5 characters. # $cols[1] gets next 10 characters (after skipping 2). # $cols[2] gets next 10 characters (after skipping 2). # $cols[3] gets next 10 characters (after skipping 2). # etc...` [download]	[reply] [d/l]
Re: Capturing columnar data from a text file by TedPride (Priest) on May 16, 2006 at 23:10 UTC
`my ($x, $y, $c, $d, $e, $f, $g, $h) = (split / /, $line)[0, 1, 3, 4, 12, 32, 34, 48]; print $e;` [download]	[reply] [d/l]
Re: Capturing columnar data from a text file by johngg (Canon) on May 16, 2006 at 22:38 UTC
It is telling that the replies from ptum, esper and ruzam all contain the caveats "if your data ... " or "if your lines ... " or the like. I think it would be helpful here if you could post a sample of the data. Cheers, JohnGG	[reply]
Re: Capturing columnar data from a text file by planetscape (Chancellor) on May 17, 2006 at 10:37 UTC
The advice here might also help: What is the simplest way to print a field, as in $3 does in awk HTH, planetscape	[reply]