comment on

I've got a regexp that I created which works in regexxer. It is supposed to extract matches for all 3-digit numbers from a large text document. It doesn't work in Perl. It does not handle repetitions. Here's the code that shows the regexp and demonstrates the problem. Suggestions, please.

#!/usr/bin/perl 
#use strict;
use warnings;
our $text=  <<TEXT;
    Those APCs are APC 282, 376, 377 and 398. The APC assignments are 
+also shown in attachment K1. In the Final Rule, we indicated that cli
+nical characteristics and expected resource use.  Procedures are suff
+iciently similar to those other procedures assigned to APC 282, 376, 
+377, and 398, and that we believe those APC assignments were appropri
+ate. Specifically APCs 662 and APC 282. As shown in attachment K3 und
+er option number 1, to be placed in APC 662. Our data analysis shows 
+that combining services currently assigned to APC 662 would result in
+ an APC median cost of about 302. The 6 CPT-Codes that would go into 
+APC 662 are: CPT-Codes 0145T through 0150T. The two other cardiac CT 
+codes, specifically 0144T and 0151T would be assigned to APC 282. The
+ inclusion of the two codes into APC 282 would result in...

TEXT

our @extracts;                   
pos($text)=0;  

while (my @match = $text =~ m/(APC[s]?)\s(?:(\d{3})(?:\s|,\s|\.\s))
(?:(\d{3})(?:\s|,\s|\.\s)){0,} #
(?:and\s([\d]{3})(?:\s|,\s|\.\s)){0,1}/xgc){  
push @extracts, @match;}

my $n=0;
foreach my $extracts (@extracts){
print "Match $n= $extracts[$n] ";
$n++;
print "\n";}
[download]

Here's some of the output:

Match 0= APC 
Match 1= 282 
Match 2= 377 
Match 3= 398 
Match 4= APC 
Match 5= 282 
Match 6= 377 
Match 7= 398 
Match 8= APCs 
Match 9= 662 
Use of uninitialized value in concatenation (.) or string at temp2.pl 
+line 32.
Match 10=  
Use of uninitialized value in concatenation (.) or string at temp2.pl 
+line 32.
Match 11=  
Match 12= APC 
Match 13= 282 
Use of uninitialized value in concatenation (.) or string at temp2.pl 
+line 32.
Match 14=  
Use of uninitialized value in concatenation (.) or string at temp2.pl 
+line 32.
Match 15=  
Match 16= APC 
Match 17= 662
[download]

In reply to regexp match repetition breaks in Perl by barkingdoggy

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.