sowais has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks! I am having a prob with matching characters for a part of a counter script. The purpose of this part of the script is to count how many times the $constant variable (provided upon execution) appears in the file. The problem is occurring when the $constant varible is a special variable $, $$, ##, @@, etc. I have to assume that the $constant can be anything under the sun as this is a generic script, but with my code below I am not able to accomplish that. Any help would be greatly appreciated. Thanks!

$constant = '$$'; $line_count =0; while (<FILE>){ if($constant ne "") { if($_ =~ m/^($constant)/) { $line_count++; } } }


Thank you all for your input! I have my code working using quotemeta.

Replies are listed 'Best First'.
Re: Issue with regex matching
by wanna_code_perl (Friar) on Aug 28, 2013 at 23:16 UTC

    You say you want to count how many times the variable appears in the file, but your code tries to count the number of lines that contain the variable. If you did intend the former, you can count the number of occurrences in one go:

    while (<FILE>) { $count += () = /\Q$constant\E/g; } print "$constant occurs $count times\n";

    In my experience, small files (anything less than several megabytes) often perform better if you slurp the whole thing in, rather than looping over each line:

    use File::Slurp; my $constant = '$_'; $_ = read_file('filename.pl'); $count = () = /\Q$constant\E/g; print "$constant occurs $count times\n";
Re: Issue with regex matching
by rminner (Chaplain) on Aug 28, 2013 at 22:03 UTC
    you can escape the special chars stored in the variable $constant using \Q and \E. See also perlre.
    if($_ =~ m/^(\Q$constant\E)/) { $line_count++; }
    Since $constant is user specified you could simply call quotemeta on the user input before using it. Then you won't need \Q and \E in your regex. E.g.:
    my $searchterm = quotemeta $user_input;
    You are currently only counting lines which start with that pattern. You wrote that you wanted to count all occurrences of the pattern. If you want to match all the occurrences of $constant per line you could change your code to:
    while ($_ =~ m/\Q$constant\E/gc) { $count++; }
    Other thoughts:
    • use lexical filehandles (open my $FILE ... instead of open FILE)
    • $constant doesn't seem like a fitting variable name to me. How about $searchterm?
    • in case you only open the file to look for $constant: don't loop through the file if $constant is empty/undef. You currently have the check _within_ the while (<FILE>) loop. So even if you don't have a valid searchterm you are still looping through the entire file.
      while ($_ =~ m/\Q$constant\E/gc) { $count++; }

      Minor point: in  m/\Q$constant\E/gc the  /c modifier is unnecessary, although it does no harm in this instance...

      ... although wanna_code_perl's
          $count += () = /\Q$constant\E/g;
      in a readline loop or
          $count = () = /\Q$constant\E/g;
      on a slurped file is, IMHO, better.

      Thanks for your input rminner. Adding the \Q and \E as you stated worked. I will try with quotemeta as well. I admit I could have done a better job in explaining the objective, I need to count the number of lines where the $constant variable appears in the file. For the points you mentioned under 'Other thoughts':
      - I do use something like $FILE in the code but to make the code more easy on the eyes I simply inserted the code where the problem was occurring.
      - Its a line counter script that I am converting from batch, and to keep things consistent for the users, I kept the same terminology. I do agree term could use a change.
      - I am doing something else if $constant is empty/undef, that part is working properly so thought there is no sense in copying all of that.
      Thanks Again!
Re: Issue with regex matching
by hippo (Archbishop) on Aug 28, 2013 at 22:03 UTC

    There are some things I don't understand about your code:

    1. Why does your regex start with a caret?
    2. Why are there brackets around $constant?
    3. Is there any reason not to use index instead of the regex? That should simplify things further I would have thought.

    Perhaps there is more to the spec than originally stated, but the suspicion is that you might be over-complicating it a little.

      1. the understanding is $constant will be in the beginning of the row 2. because thats how a variable is compared in regex 3. I have found regex to be much simpler and have more experience with it, so that why.

        2. because thats how a variable is compared in regex

        No, the variable $constant will be interpolated into the regex without the parentheses (round brackets). Consider:

        1:31 >perl -wE "my $constant = 'fred'; my $string = 'fredflintstone'; + print qq[found $1\n] if $string =~ /^$constant/;" Use of uninitialized value $1 in concatenation (.) or string at -e lin +e 1. found 1:32 >perl -wE "my $constant = 'fred'; my $string = 'fredflintstone'; + print qq[found $1\n] if $string =~ /^($constant)/;" found fred 1:32 >

        If the regex matches, any parentheses capture their contents into the special variables $1, $2, etc. But this incurs a performance penalty:

        WARNING: Once Perl sees that you need one of $&, $`, or $' anywhere in the program, it has to provide them for every pattern match. This may substantially slow your program. Perl uses the same mechanism to produce $1, $2, etc, so you also pay a price for each pattern that contains capturing parentheses. (“Capture groups” in perlre#Regular-Expressions)

        Since you’re not using $1, the capturing parentheses aren’t needed.

        Hope that helps,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

        Thanks for those clarifications. Athanasius has supplied a clear explanation the issue of the brackets and you've provided the extra information that $constant need only be counted at the start of the line. With that in mind we can use the simple (but potentially very inefficient) method with index thus:

        $constant = '$$'; $line_count =0; while (<FILE>){ $line_count++ unless index ($_, $constant); }
Re: Issue with regex matching
by hdb (Monsignor) on Aug 29, 2013 at 09:06 UTC

    Alternatively, you could avoid the troubles with a regular expression by setting the record separator $/ to the expression you look for. When you read the file into an array, you get one more elements than the occurences of the string you are looking for:

    use strict; use warnings; my $constant = '$$'; my $count = -1; { local $/ = $constant; $count += () = <DATA>; } print "$count\n"; __DATA__ $$ ss $$ ss $$ ss ss

    (UPDATE) or as a one-liner:

    my $count = do { local $/ = $constant; () = <DATA> } - 1;