Wise Monks,
I have several hundred files each of which has fixed column positions, but none of the files contain the same positions for these columns.
My code appears to do fine for the first 3 cols of data, but the 4th and 5th cols are sometimes right-aligned which my code can not handle. I would appreciate any help with an approach to process the problem columns differently and/or enlighten me about other potential issues in my code.
Using the post from tilly
Locate char in a string I was able to get as far as i have below. I am but a Perl hobbyist.
#!/usr/bin/perl -w
use strict;
use diagnostics;
my @pos;
my $line;
my @field;
open FILE, 'TEST2.txt' or die "Can't open input file: $!\n";
my @data = <FILE>;
close(FILE);
&find_position;
foreach $line (@data) {
my @rec;
my $prev = 0;
foreach my $col (@pos) {
push @rec, substr( $line, $prev, $col - $prev - 1 );
$prev = $col - 1;
}
print join( ':', @rec );
print "\n ----- \n";
@rec = undef;
}
sub find_position {
foreach $line (@data) { # find first line to meet conditions and
+capture position info
if ( $line =~ /^(\w.*|\w+\S.*)(\s{2,}.*\S)(\s{2,})\d{9}/x && !
+$pos[0] )
{ # match my delimiter
while ( $line =~ /(\s{2,}|\t\s?)(\w|\d)/g ) {
push @pos, pos($line);
}
}
}
}
Some sample data that illustrates the variation in a given file
The First One Here Is Longer. Collie SN 262287630 77312 93871 MVP
A Second (PART) here First In 20 MT 169287655 506666 61066 RTD
3rd Person "Something" X&Y No SH 564287705 45423 52443 RTE
The Fourth Person 20 MLP 4000 360505504 3530 72201 VRE
The Fifth Name OR Something Twin 200 SH 469505179 3530 72201 VRE
The Sixth Person OR Item MLP 260505174 3,530 72,201 VRE
70 The Seventh Record MLP 764205122 3530 72201 VRE
The Eighth Person MLP MLP 160545154 3530 7220 VRE
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.