Since this seems to be fixed-width columns, I'd probably go with tilly's suggestion of unpack. Here is a stab at a regex solution that I'd probably not use:

#!/usr/bin/perl -w use strict; # First, we will allow no digits anywhere in a name; # this will allow us to detect the extension after the # name. Second, we only allow single spaces in a name # (and can't start or end with a space). Third, names # must contain a comma (but not in front). my $name= qr{ ( [^\s\d] (?: \s?[^\d\s]+ )* ) , ( (?: \s?[^\d\s]+ )* ) }x; # Org can only have single spaces and no commas, but is optional: my $org= qr{ (?: [^\s,](?:\s?[^\s,]+)*[^\s,] )? }x; # Heading must have "-" on each end and just one word between: my $head= qr{ -\s+[a-z]+\s+- }ix; my $entry= qr{ \s* (?: $head | $name \s+ (\d+) \s+ (\S*) \s+ ($org) ) }x; #print "$entry\n"; while( <DATA> ) { my @matches= m/^$entry$entry\s*$/; #print "$_"; for( [0..4], [5..9] ) { my( $last, $first, $ext, $room, $org )= map { defined $_ ? $_ : "" } @matches[@$_]; if( "" ne $last ) { print "($last), ($first) ($ext) ($room) ($org)\n"; } } } __END__ NAME EXT RM# ORG NAME EXT RM# +ORG ------------------------------------- ----------------------------- +--------- - A - BASILE, YYYY 5555 1H08 + IAMG ABEND, YYYYYY 5555 2014 CE BATES, YYYY 5555 4832 + BT ABRAMS, YYYYY 5555 C-07 BATHERSFIELD, YY 5555 B-39 + CE ADAMS, YYYY 5555 255 OTC BAXTER, YYYY 5555 A-43 + ADAMS, YYYY 5555 149 BT BEAR, YYYYYY 5555 H42 + ATO ADAMS, YYYYYYY 5555 A-16 BEASLEY, YYY 5555 D-79 + ADUAKA, YYYYYYYY 5555 A-52 BEATTY, YY 5555 4832 + TAG AHMED, YYYYYY 5555 C-63 BECHTLE, YYYY 5555 D-26 + AHMED, C. YYYYYY 5555 D-69 SOMEU BEDOYA, YYYYYYYY 5555 + CE

Which prints the following:

(BASILE), ( YYYY) (5555) (1H08) (IAMG) (ABEND), ( YYYYYY) (5555) (2014) (CE) (BATES), ( YYYY) (5555) (4832) (BT) (ABRAMS), ( YYYYY) (5555) (C-07) (BATHERSFIEL) (D), ( YY) (5555) (B-39) (CE) (ADAMS), ( YYYY) (5555) (255) (OTC) (BAXTER), ( YYYY) (5555) (A-43) () (ADAMS), ( YYYY) (5555) (149) (BT) (BEAR), ( YYYYYY) (5555) (H42) (ATO) (ADAMS), ( YYYYYYY) (5555) (A-16) (BEASLE) (Y), ( YYY) (5555) (D-79) () (ADUAKA), ( YYYYYYYY) (5555) (A-52) (BEATT) (Y), ( YY) (5555) (4832) (TAG) (AHMED), ( YYYYYY) (5555) (C-63) (BECHTL) (E), ( YYYY) (5555) (D-26) () (AHMED), ( C. YYYYYY) (5555) (D-69) (SOMEU) (BEDOYA), ( YYYYYYYY) (5555) (CE) ()

Finally, $name =~ /\U$name/; doesn't do anything useful. You want $name= uc $name;.

        - tye (but my friends call me "Tye")

In reply to Re: Need help with regex by tye
in thread Need help with regex by BastardOperator

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.