Greetings, O monks;
I am fairly new to Perl, and this is my first submitted question. I seek wisdom on a problem that looks extremely simple, but has been causing me a lot of grief.
I wish to parse a string to find all-caps words of at least 3 characters. My example code looks like this:
my $finalline = "3.2 COMPLIANCE WITH LAWS AND REGULATIONS. While the + Product is in its possession or under its, or its sub-contractor's c +ontrol, THEDUDES shall comply, or ensure that its subcontractors comp +ly, with all applicable federal, state and local statutory and regula +tory requirements regarding the manufacture, if applicable, packaging +, handling transportation and storage of the Product."; @words = ($finalline =~ /([A-Z][A-Z][A-Z]+)/g); print ("@words \n");
This returns:
COMPLIANCE WITH LAWS AND REGULATIONS THEDUDES
...as desired.
However, when I try to plug this into my larger program, the code behaves oddly. Specifically, $finalline has the desired content, but @words usually ends up empty. The especially odd thing is that if I remove the "/g" above, @words will retrieve the first capitalized word, but if I put it back in, it will retrieve zero capitalized words(!) I have also tried adding "/gc" instead, without success. The code does more or less what it's supposed to apart from this bug (and a couple of others).
Here is my code. As far as I can tell, the relevant bits (lines 150-153, marked as "#PROBLEM CODE!!!" below) are exactly the same as in my example, but do not work correctly. Any help would be greatly appreciated.
(Addendum: I have since gone back and turned on "use strict", and declared all my variables ahead of time. This has not solved the problem.)
#!/usr/bin/perl -w use diagnostics; use Spreadsheet::WriteExcel::Big; use HTML::Restrict; use File::Slurp; #use Win32::Word::Writer; use RTF::Writer; opendir(DIR2, "contract/new"); my @files2 = readdir(DIR2); closedir(DIR2); foreach $file2 (@files2) { print "$file2\n"; open (FH2, "contract/new/$file2"); $newfile2=$file2; $newfile2 =~ s/\.html//g; my @filelines2 = <FH2>; chomp @filelines2; my $masterplan = join(' ', @filelines2); $hr2 = HTML::Restrict->new(); $masterplan = $hr2->process($masterplan); my @masterarray = split('DUMMY',$masterplan); my $caps = 0; my $traps = 0; my $escape = 0; my @words = "";
foreach $finalline (@masterarray){ $escape = 0; $finalline =~ s/\&.{3,5}\;/ /g; $finalline =~ s/\s+/ /g; while ($finalline =~ /(\w\w+\b)/gc){ $linelength++ } if (($finalline =~ /^\s*\d|^\s*Section|^\s*Article|^\s*[A-Z]\. +|witness whereof/i) && ($holder == $rtf)){ while ($escape == 0){ if (($finalline =~ /[A-Z][A-Z][A-Z]/) && ($caps == 0) && ( +$traps == 0)){ print(">>>$finalline<<</n"); @words = ($finalline =~ /([A-Z][A-Z][A-Z]+)/g); #PROBLEM CODE!!! print("@words"); #PROBLEM CODE!!! $lineholder = $finalline; #PROBLEM CODE!!! $finalline = join(' ', @words); #PROBLEM CODE!!! $caps++; } #etc.
Update: Removed extraneous code; added new line (print (">>>$finalline<<</n");)
Update 2: smls's solution works. Thanks smls!
Update 3: Restored problem code (see below).
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |