my approach is similar to others, but more 'structured'.
note that rules for accepting whitespace are more lax.

use strict; use warnings; my $text= <<TEXT; Those APCs are APC 282, 376, 377 and 398. The APC assignments are also shown in attachment K1. In the Final Rule, we indicated that clinical characteristics and expected resource use. Procedures are sufficiently similar to those other procedures assigned to APC 282, 376, 377, and 398, and that we believe those APC assignments were appropriate. Specifically APCs 662 and APC 282. As shown in attachment K3 under option number 1, to be placed in APC 662. Our data analysis shows that combining services currently assigned to APC 662 would result in an APC median cost of about 302. The 6 CPT-Codes that would go into APC 662 are: CPT-Codes 0145T through 0150T. The two other cardiac CT codes, specifically 0144T and 0151T would be assigned to APC 282. The inclusion of the two codes into APC 282 would result in... and also APC 101,102or103, and not 666. But APC 6666 is not really an APC! How about APC 6666, 777? (Neither is parsed.) How about APCs 777, 6666? (Gets 777, ignores 6666; is this OK?) TEXT # define regex components # an APC number my $number = qr( \d{3} (?! \d ) )x; # 3 digits, not followed by a dig +it # required preamble to an APC number my $preamble = do { my $leadin = qr( APC s? )x; my $separator = qr( \s+ )x; qr( $leadin $separator )x; }; # additional APC numbers may follow after properly introduced number my $continuation = do { my $comma = qr( , )x; my $clause = qr( $comma? \s* (?: and | or ) )x; # \G means continue from point previous match ended qr( \G \s* (?: $comma | $clause ) \s* )x; }; # end regex component definitions # do test extraction my @extracts = $text =~ m{ (?: $preamble | $continuation) ($number) }xg; print "Extract $_ = $extracts[$_] \n" for 0 .. $#extracts;
output:

Extract 0 = 282 Extract 1 = 376 Extract 2 = 377 Extract 3 = 398 Extract 4 = 282 Extract 5 = 376 Extract 6 = 377 Extract 7 = 398 Extract 8 = 662 Extract 9 = 282 Extract 10 = 662 Extract 11 = 662 Extract 12 = 662 Extract 13 = 282 Extract 14 = 282 Extract 15 = 101 Extract 16 = 102 Extract 17 = 103 Extract 18 = 777
hth -- bill

In reply to Re: regexp match repetition breaks in Perl by Anonymous Monk
in thread regexp match repetition breaks in Perl by barkingdoggy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.