Wise Monks,
I have several hundred files each of which has fixed column positions, but none of the files contain the same positions for these columns.
My code appears to do fine for the first 3 cols of data, but the 4th and 5th cols are sometimes right-aligned which my code can not handle. I would appreciate any help with an approach to process the problem columns differently and/or enlighten me about other potential issues in my code. Using the post from tilly Locate char in a string I was able to get as far as i have below. I am but a Perl hobbyist.
#!/usr/bin/perl -w use strict; use diagnostics; my @pos; my $line; my @field; open FILE, 'TEST2.txt' or die "Can't open input file: $!\n"; my @data = <FILE>; close(FILE); &find_position; foreach $line (@data) { my @rec; my $prev = 0; foreach my $col (@pos) { push @rec, substr( $line, $prev, $col - $prev - 1 ); $prev = $col - 1; } print join( ':', @rec ); print "\n ----- \n"; @rec = undef; } sub find_position { foreach $line (@data) { # find first line to meet conditions and +capture position info if ( $line =~ /^(\w.*|\w+\S.*)(\s{2,}.*\S)(\s{2,})\d{9}/x && ! +$pos[0] ) { # match my delimiter while ( $line =~ /(\s{2,}|\t\s?)(\w|\d)/g ) { push @pos, pos($line); } } } }
Some sample data that illustrates the variation in a given file
The First One Here Is Longer. Collie SN      262287630	  77312	   93871  MVP		
A  Second (PART) here         First In 20 MT 169287655	  506666   61066  RTD		
3rd Person "Something"        X&Y No SH      564287705	  45423    52443  RTE	
The Fourth Person 20          MLP 4000       360505504	  3530     72201  VRE	
The Fifth Name OR Something   Twin 200 SH    469505179	  3530     72201  VRE
The Sixth Person OR Item      MLP            260505174	  3,530   72,201  VRE
70 The Seventh Record         MLP            764205122	  3530     72201  VRE
The Eighth Person MLP         MLP            160545154	  3530      7220  VRE 

In reply to Fixed Position Column Records by daseme

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.