lloder174 has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to refactor a lot of files to use the English module and I am running into problems that are totally puzzling to me using Perl RegExp's with my interpreter. My code is intended to find the first line containing my search text and replace that line with whatever was there plus a blank line and an additional line for 'use' of the English module. Unfortunately the searched for line is becoming the third line and the 'blank line and additional line' are preceding it - a reverse of the desired order, which boggles my mind!!! Hellllpppp!!! here's the code in question:
#!/usr/bin/perl -w ###-DDEBUGGING -d -Dslt` use Env; use English ( no_match_vars ); local $INPUT_RECORD_SEPARATOR; my $searchPattern = "(^\#!/usr/bin/perl.*?\n)"; my $insertion = "\nuse English \( no_match_vars \);\n"; #my $content = "#!/usr/bin/python\n"; my $content = "#!/usr/bin/perl\n"; print "content = \n$content \n"; print "insertion = \n$insertion \n"; $content =~ s?${searchPattern}?${1}${insertion}?xmsp; print " content now = \n$content \n"; exit 0;

20090430 Janitored by Corion: Added formatting, code tags, as per Writeup Formatting Tips

Replies are listed 'Best First'.
Re: junior application developer
by johngg (Canon) on Apr 29, 2009 at 21:35 UTC

    I'm not sure that there is a 'p' modifier for regular expressions (did you mean to use the '-p' command-line switch?) and there is no point in using the 'x' modifier if you don't use extended syntax. This can be done in a one-liner like this.

    $ mkdir lloderl74 $ cat > lloderl74/script1 #!/usr/bin/perl -w print "Hello World!\n"; exit; $ cat > lloderl74/script2 #!/usr/bin/perl -w die "Goodbye, Cruel World\n"; $ perl -pi.bak -e ' s{^(#!/usr/bin/perl.*)}{$1\n\nuse English q{no_match_vars}}' lloderl74 +/* $ head -99 lloderl74/* ==> lloderl74/script1 <== #!/usr/bin/perl -w use English q{no_match_vars} print "Hello World!\n"; exit; ==> lloderl74/script1.bak <== #!/usr/bin/perl -w print "Hello World!\n"; exit; ==> lloderl74/script2 <== #!/usr/bin/perl -w use English q{no_match_vars} die "Goodbye, Cruel World\n"; ==> lloderl74/script2.bak <== #!/usr/bin/perl -w die "Goodbye, Cruel World\n";

    I hope this helps you.

    Cheers,

    JohnGG

      Thanks, johngg. This unfortunately does not help because of my wanting to avoid the command line and I also want to adhere to Perl Best Practices, (PBP)! I have changed the $INPUT_RECORD_SEPARATOR to undefined on purpose so that I may slurp in an entire file at a time, because this code is a small part of a script and this is more efficient than rewriting the file a line at a time. If you'll notice, I am using the English module within this code and I originally used ${^MATCH}, (which requires the /p expression modifier), instead of $1 but it also did not work! Help still needed! P.S. My apologies for a mistakenly created subject line which does not really help!
        You shouldn't need to use PBP for one-liners - that defeats the point. And if you're so worried about it, shouldn't you also be using strict and warnings? In that case, you would see the compilation error

        Bareword "no_match_vars" not allowed while "strict subs" in use at ...

        promptly fix it with use English qw( -no_match_vars ) and be on your merry way. :)

        Update - added the qw (thanks moritz).

        junior application developer
        lloder174: You can go back and Update the title of your post (making a note that you have done so), and you can at the same time fix the problem that your code does not have a valid  </code> end tag, which renders it almost unreadable.

        johngg: I believe the  //p regex modifier is new with 5.10: see p in perlre.

Re: junior application developer
by ELISHEVA (Prior) on Apr 30, 2009 at 07:57 UTC

    Your specific problem (things being printed in the wrong order) appears to be due to the combination of double quotes and the x modifier. The double quotes in my $searchPattern = "(^\#!/usr/bin/perl.*?\n)"; causes "\#" to be inserted in the string as plain "#". The x modifier causes Perl to read "#" as the start of a comment. Thus the regex ends up being ^ alone followed by a comment whose text is "!/usr/bin/perl". The net effect is that $1 evaluates to the empty string. The regular expression matches the position at the start of the string and so inserts $insertion at the start of the string. /usr/bin/perl is never matched.

    Now for how to avoid such problems in the future:

    • use strictures. That is, start your script with use strict; use warnings;. Had you done so you would have noticed a problem right away. In addition to the bareword in use English... (which should be use English qw(no_match_vars)), you would have seen the message alerting you to a problem in the regular expression: Unmatched ( in regex; marked by <-- HERE in m/( <-- HERE ^#!/usr/bin/perl[^\n]*\n)/.
    • use \\ in double quoted strings when you want to insert an actual backslash rather than just escape the next character.
    • use test strings with data before and after the insertion point, e.g. "#!/usr/bin/perl   \n\n\nyaba-dubba-doo\n";. Had you done that it would have been more obvious that it wasn't "reversing the order" but rather it was matching the start of the string alone.

    And a few comments on best practice:

    • avoid .*? and .* They rarely do what you want. A better choice is something more specialized like [^\n]*\n. This specifically allows one to skip past all characters on the same line and stops at the end of the line.
    • avoid using '?' as a separator in substitutions (the s/// operator). At best it is confusing. '?' has a special meaning within regular expressions. Also ?pattern? "once only" searches are "vaguely deprecated". In the future you might want to consider something like s||| or s{}{}. The later can be used even with variables expressed as ${varname}. Perl knows how to handle nesting properly. See perlop for details.
    • Avoid slurping except in one liners. Get in the habit of using while ($line=<INFILE>) {...} instead. It uses far less memory.
    • When printing out variables use non-whitespace to surround the variable value so that you can easily see unexpected hidden whitespace. print "content=<$content>\n" is far more likely to alert you to problems that print "content=\n$content\n"

    The following code illustrates these points and outputs the insertion code in the correct position:

    Best, beth

    Updated: explain specific cause of error and added code sample.

      'better late than never' In case I have not already done so, I want to thank you for your clear and helpful feedback. I am currently experiencing some very sharp learning curves in Perl and I am only gradually being able to have the time to learn how best to utilize the Perl Monastery Gates.