comment on

my approach is similar to others, but more 'structured'.
note that rules for accepting whitespace are more lax.

use strict;
use warnings;


my $text=  <<TEXT;
    Those APCs are APC 282, 376, 377 and 398.
The APC assignments are also shown in attachment K1.
In the Final Rule, we indicated that clinical characteristics and
expected resource use.  Procedures are sufficiently similar to those
other procedures assigned to APC 282, 376, 377, and 398, and that
we believe those APC assignments were appropriate.
Specifically APCs 662 and APC 282. As shown in attachment K3 under
option number 1, to be placed in APC 662. Our data analysis shows
that combining services currently assigned to APC 662 would result
in an APC median cost of about 302. The 6 CPT-Codes that would go
into APC
662 are: CPT-Codes 0145T through 0150T. The two other cardiac CT
codes, specifically 0144T and 0151T would be assigned to APC 282.
The inclusion of the two codes into APC 282 would result in...
and also APC 101,102or103, and not 666.
But APC 6666 is not really an APC!
How about APC 6666, 777?  (Neither is parsed.)
How about APCs 777, 6666?  (Gets 777, ignores 6666; is this OK?)
TEXT


# define regex components

# an APC number
my $number = qr( \d{3} (?! \d ) )x;  # 3 digits, not followed by a dig
+it

# required preamble to an APC number
my $preamble = do {
    my $leadin    = qr( APC s? )x;
    my $separator = qr( \s+ )x;
    qr( $leadin $separator )x;
    };

# additional APC numbers may follow after properly introduced number
my $continuation = do {
    my $comma  = qr( , )x;
    my $clause = qr( $comma? \s* (?: and | or ) )x;
    # \G means continue from point previous match ended
    qr( \G \s* (?: $comma | $clause ) \s* )x;
    };

# end regex component definitions


# do test extraction

my @extracts = 
    $text =~ m{ (?: $preamble | $continuation) ($number) }xg;

print "Extract $_ = $extracts[$_] \n" for 0 .. $#extracts;
[download]

output:

Extract 0 = 282
Extract 1 = 376
Extract 2 = 377
Extract 3 = 398
Extract 4 = 282
Extract 5 = 376
Extract 6 = 377
Extract 7 = 398
Extract 8 = 662
Extract 9 = 282
Extract 10 = 662
Extract 11 = 662
Extract 12 = 662
Extract 13 = 282
Extract 14 = 282
Extract 15 = 101
Extract 16 = 102
Extract 17 = 103
Extract 18 = 777
[download]

hth -- bill

In reply to Re: regexp match repetition breaks in Perl by Anonymous Monk
in thread regexp match repetition breaks in Perl by barkingdoggy

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.