First off, I don't know if I'd specifically call your format 'fixed width', as it doesn't match what I'm used to dealing with -- simple tabular data with lots of whitespace. I haven't had to deal with the formatting you're dealing with, but I could probably deal with whitespace padded tabular data in a consistent manner.

Although this probably will have some false negatives for the odd files that I deal with, I'd probably take some subset of the middle of the file (ie, try to remove headers and footers), and then use something like BrowserUK's unpack mask generator to see if there are columns of consistently white space among columns of non-whitespace.

Obviously, this is going to fail in the case if you include the header or footer, and there's a good chance of it not matching multiline records (but still fixed width) or if there are sub-headings of substantial length. Many of the fixed-width files I deal with have various formatting quirks, but if yours are more consistent, it might be worthwhile.

for the case where you don't have whitespace padding, but you do have data other than strings, you might be able to create masks of where there's numeric vs. alpha columns, and make your decision based on that. (still wouldn't deal with the multi-line record issue, though)


In reply to Re: how to identify a fixed width file by jhourcle
in thread how to identify a fixed width file by ftumsh

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.