Here's another approach. It doesn't extract all the fields in one swell foop (the patterns I'm calling  $real numbers have to be extracted in a separate step), but one can imagine some degree of customization is possible for different types of records:

c:\@Work\Perl\monks>perl -wMstrict -le "my @lines = ( 'C31 6 3 2.4 1.5 2.6 ', 'C32 2 7 3 1.0 ', 'H31 1 1 0 21.0 11.2 5.3 1.4', 'T11 2 1 0 6.0 1.1 2.2', 'L06 1 1 0 1.0 3.3', 'L99 1 1 0 1.1 2.2 3.3 4.4 5.5', ); ;; my $int = qr{ (?<! \d) \d+ (?! \d) }xms; my $real = qr{ $int [.] $int }xms; my $header = qr{ [[:upper:]] \d\d }xms; ;; my $n = 4; my $extract = qr{ ($header) \s+ ($int) \s+ ($int) \s+ ($int) ((?: \s+ $real){1,$n}) +\s* }xms; ;; for my $line (@lines) { printf qq{'$line' -> }; my $got = my ($h, $d1, $d2, $d3, $r) = $line =~ m{ \A $extract \z } +xms; ;; if ($got) { my @reals = $r =~ m{ $real }xmsg; print qq{'$h' '$d1' '$d2' '$d3' (@reals)}; } else { print 'unknown'; } } " 'C31 6 3 2.4 1.5 2.6 ' -> unknown 'C32 2 7 3 1.0 ' -> 'C32' '2' '7' '3' (1.0) 'H31 1 1 0 21.0 11.2 5.3 1.4' -> 'H31' '1' '1' '0' (21.0 11.2 5.3 1.4) 'T11 2 1 0 6.0 1.1 2.2' -> 'T11' '2' '1' '0' (6.0 1.1 2.2) 'L06 1 1 0 1.0 3.3' -> 'L06' '1' '1' '0' (1.0 3.3) 'L99 1 1 0 1.1 2.2 3.3 4.4 5.5' -> unknown

Update: Tested under Perl versions 5.14.4 and 5.8.9.


Give a man a fish:  <%-(-(-(-<


In reply to Re: validate variable-length lines in one regex? by AnomalousMonk
in thread validate variable-length lines in one regex? by uhClem

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.