wrkrbeee has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perl Monks, have a series of regexs that work great until I impose a series of OR conditions. I receive an error stating "use of uninitialized value for $url_sub at line 45." Excluding the OR series eliminates the error, so I'm not sure how the ORs create an error. Grateful for any ideas. Thanks!

while (my $line = <$FH_IN>) { chomp $line; #removes line break or new line; my $url_sub = ""; print "processing file $counter,\n"; # Within each record, loop until a match occurs with "subsid" in the r +ecord,; # and the record contains a URL address, and the URL contain a derivat +ive; # of Exhibt 21, e.g., ex21, ex-21, exhibit21, or exhibt-21; + while ($line =~ m/subsidiar/igm && $line =~ m/\/Archives(.* +)">/igm && ($line =~ m/ex21/igm || $line =~ m/EX\-21/igm || $line =~ m/exhibit\-21/igm || $line =~ m/exhibit21/i +gm) ) { $url_sub = $1; #extract the match and assign to var +iable; #Join the match with the standard URL prefix; my $url = join("",$base_url,$url_sub); print $url; open my $FH_OUT, '>>',$write_dir or die "Can't open fi +le $write_dir >:$!>"; print $FH_OUT "$url\n"; ++$counter; } #End of while loop for extracting matching text; } #End of record;

Replies are listed 'Best First'.
Re: Series of REGEX with OR
by NetWallah (Canon) on Apr 27, 2016 at 17:23 UTC
    From "perldoc perlvar":

    $1... Contains the subpattern from the corresponding set of capturing parentheses from the last successful pattern match

    Some of your regexen do have capture but not all of them.

    If the last regex tested does NOT have (capturing parens), $1 will not have a value, resulting the warning you are getting.

    If you expect the URL only after "Archive", you can capture it thus:

    ... && my ($url)= $line =~ m/\/Archives(.*)/img && ...

            This is not an optical illusion, it just looks like one.

      Thanks NetWallah! So if I move the capturing regex to be the last regex, then $1 should capture my extract target? Thanks again1

        Assuming one of your other regexes haven't matched already, because otherwise it'll short circuit and not capture.

Re: Series of REGEX with OR
by hippo (Archbishop) on Apr 27, 2016 at 19:04 UTC

    It seems you already have an answer. However I have a little suggestion to improve your series of OR'd expressions. Instead of writing

    ($line =~ m/ex21/igm || $line =~ m/EX\-21/igm || $line =~ m/exhibit\-21/igm || $line =~ m/exhibit21/igm)

    instead try this:

    $line =~ /ex(?:hibit)?-?21/i

    This matches an 'ex' optionally followed by 'hibit' optionally followed by '-' and then ending in '21'. You don't need the /g modifier if you are only looking for one match and you don't need the /m modifier if you aren't matching with ^ or $. This is shorter and to my eyes easier to read and therefore maintain. HTH.

Re: Series of REGEX with OR
by Not_a_Number (Prior) on Apr 27, 2016 at 17:24 UTC

    Change your second while to if.

      Thank you "Not a number"!!!
Re: Series of REGEX with OR
by talexb (Chancellor) on Apr 27, 2016 at 17:19 UTC
      $url_sub = $1;

    I'm guessing this is line 45 -- and it's complaining because $1 is the first capture of a regex (using brackets), but you haven't used brackets in any of your regexes.

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

      Thanks talexb! can you clarify what you mean by "you haven't used brackets in any of your regexes" ? I used one set of parentheses to extract the target data, which I thought would be held in $1. Not sure where you are going here. Sorry!