Greetings, O monks;

I am fairly new to Perl, and this is my first submitted question. I seek wisdom on a problem that looks extremely simple, but has been causing me a lot of grief.

I wish to parse a string to find all-caps words of at least 3 characters. My example code looks like this:

my $finalline = "3.2 COMPLIANCE WITH LAWS AND REGULATIONS. While the + Product is in its possession or under its, or its sub-contractor's c +ontrol, THEDUDES shall comply, or ensure that its subcontractors comp +ly, with all applicable federal, state and local statutory and regula +tory requirements regarding the manufacture, if applicable, packaging +, handling transportation and storage of the Product."; @words = ($finalline =~ /([A-Z][A-Z][A-Z]+)/g); print ("@words \n");

This returns:

COMPLIANCE WITH LAWS AND REGULATIONS THEDUDES

...as desired.

However, when I try to plug this into my larger program, the code behaves oddly. Specifically, $finalline has the desired content, but @words usually ends up empty. The especially odd thing is that if I remove the "/g" above, @words will retrieve the first capitalized word, but if I put it back in, it will retrieve zero capitalized words(!) I have also tried adding "/gc" instead, without success. The code does more or less what it's supposed to apart from this bug (and a couple of others).

Here is my code. As far as I can tell, the relevant bits (lines 150-153, marked as "#PROBLEM CODE!!!" below) are exactly the same as in my example, but do not work correctly. Any help would be greatly appreciated.

(Addendum: I have since gone back and turned on "use strict", and declared all my variables ahead of time. This has not solved the problem.)

#!/usr/bin/perl -w use diagnostics; use Spreadsheet::WriteExcel::Big; use HTML::Restrict; use File::Slurp; #use Win32::Word::Writer; use RTF::Writer; opendir(DIR2, "contract/new"); my @files2 = readdir(DIR2); closedir(DIR2); foreach $file2 (@files2) { print "$file2\n"; open (FH2, "contract/new/$file2"); $newfile2=$file2; $newfile2 =~ s/\.html//g; my @filelines2 = <FH2>; chomp @filelines2; my $masterplan = join(' ', @filelines2); $hr2 = HTML::Restrict->new(); $masterplan = $hr2->process($masterplan); my @masterarray = split('DUMMY',$masterplan); my $caps = 0; my $traps = 0; my $escape = 0; my @words = "";
foreach $finalline (@masterarray){ $escape = 0; $finalline =~ s/\&.{3,5}\;/ /g; $finalline =~ s/\s+/ /g; while ($finalline =~ /(\w\w+\b)/gc){ $linelength++ } if (($finalline =~ /^\s*\d|^\s*Section|^\s*Article|^\s*[A-Z]\. +|witness whereof/i) && ($holder == $rtf)){ while ($escape == 0){ if (($finalline =~ /[A-Z][A-Z][A-Z]/) && ($caps == 0) && ( +$traps == 0)){ print(">>>$finalline<<</n"); @words = ($finalline =~ /([A-Z][A-Z][A-Z]+)/g); #PROBLEM CODE!!! print("@words"); #PROBLEM CODE!!! $lineholder = $finalline; #PROBLEM CODE!!! $finalline = join(' ', @words); #PROBLEM CODE!!! $caps++; } #etc.

Update: Removed extraneous code; added new line (print (">>>$finalline<<</n");)

Update 2: smls's solution works. Thanks smls!

Update 3: Restored problem code (see below).


In reply to Regex: Example works, plugging it into code doesn't. by EclecticScion

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.