the_Don has asked for the wisdom of the Perl Monks concerning the following question:

Here is the problem:
I am decrypting a file. I am using a module/script that was not written by me nor is the original author redilay available for assistance. At one point there is a matching operation that fails for one file, but passes for another, but the only thing that chagnes is the filename. The status messages from GnuPG are identical, the file contents are identical... only the extension of the file changes.

For your review, the sections of code that are in question, in the order in which they are encountered:
Section 1

my $new_filename = $full_filename; $new_filename =~ s/(\.gpg)|(\.encrypted)/\.decrypted/;

Section 2
$error ||= "Unexpected signer '$signer' is not '$expected_sender'." unless ($signer =~ m/\Q$expected_sender\E/);

The file that results in the m// making setting an error is when I use a file named 'FILE.gpg' or 'FILE.encrypted'. If I decrypt a file named 'FILE.pgp' the m// in section 2 does not create an error.

I ran the script and printed out the contents of the variables before the m// is executed and regardless of file name (assuming the contents are identical) it is something similar to the following (I will change the key info since this is private material):
$signer = 'GOODSIG 86125876138571635 ClientKey'
$expected_signer = '';

Thus the following command is evaluated regardless of filename:

$error ||= "Unexpected signer 'GOODSIG 86125876138571635 ClientKey' is + not ''." unless ('GOODSIG 86125876138571635 ClientKey' =~ m/\Q\E/);
yet sometimes an error is created, sometimes not.

My only thoughts is that when I pass a file name that results in a match in Section 1 something is set internally in perl that causes the match in Section 2 to set the error, but this is deductive reasoning, not solid knowledge of perl. And that is what I am hoping to gain - an understanding of how perl works so I can solve this problem.

the_Don
...making offers others can't rufuse.

Replies are listed 'Best First'.
Re: Regular Expression gotchas ?
by sauoq (Abbot) on Feb 14, 2003 at 22:45 UTC
    Thus the following command is evaluated regardless of filename:
    $error ||= "Unexpected signer 'GOODSIG 86125876138571635 ClientKey' is + not ''." unless ('GOODSIG 86125876138571635 ClientKey' =~ m/\Q\E/);
    yet sometimes an error is created, sometimes not.

    If the error is created, that's because it is set elsewhere. The code you gave will not change $error when $expected_signer is an empty string. That's because m/\Q$foo\E/ will match anything when $foo is empty. (Every string contains an empty string.) In this snippet, $error is only set unless the match is true. Since it is true, $error isn't set.

    Update: Changed "Every string matches an empty string" to "Every string contains an emptry string." The latter is what I really meant (as the context implies.) My thanks go to Enlil for spotting it.

    -sauoq
    "My two cents aren't worth a dime.";
    

      Please run the code below and see if you get the same output as I do (also below). If so, I would like to know why it is behaving as it is.

      #!/usr/local/perl -w use strict; my $filename1 = 'FILE.pgp'; my $filename2 = 'FILE.gpg'; my $signer = 'Some person <someone@this.com>'; my $exp_sender= ''; my $new_filename1 = $filename1; $new_filename1 =~ s/(\.gpg)|(\.encrypted)/\.decrypted/; print ".PGP FILE ERROR:'$signer' is not '$exp_sender'.\n" unless ($signer =~ m/\Q$exp_sender\E/); my $new_filename2 = $filename2; $new_filename2 =~ s/(\.gpg)|(\.encrypted)/\.decrypted/; print ".GPG FILE ERROR:'$signer' is not '$exp_sender'.\n" unless ($signer =~ m/\Q$exp_sender\E/); my $new_filename3 = $filename1; $new_filename3 =~ s/(\.gpg)|(\.encrypted)/\.decrypted/; print ".PGP ERROR, why the second time but not the first?:'$signer' is + not '$exp_sender'.\n" unless ($signer =~ m/\Q$exp_sender\E/); 1;

      And below is the out put I get from the above script

      $ perl match_test.pl .GPG FILE ERROR:'Some person <someone@this.com>' is not ''. .PGP ERROR, why the second time but not the first?:'Some person <someo +ne@this.com>' is not ''.

      the_Don
      ...making offers others can't rufuse.

        Strangely enough, you've hit a regular expression gotcha.

        The empty regular expression has a special meaning: try again the last regular expression that successfully matched. What is perhaps less obvious is that perl checks for an empty expression after interpolating variables, such as the \Q$exp_sender\E in your example.

        Here's a snippet to illustrate:

        "test" =~ /t/; # successful match my $var = ""; print "matched 't'\n" if "t" =~ /\Q$var\E/; print "matched 'u'\n" if "u" =~ /\Q$var\E/;

        A couple of alternative approaches, depending on what you're trying to achieve:

        print "'$signer' is not '$exp_sender'\n" unless $signer eq $exp_sender; print "'$signer' is not '$exp_sender'\n" unless index($signer, $exp_sender) >= 0; print "'$signer' is not '$exp_sender'\n" unless $exp_sender eq '' || $signer eq $exp_sender;

        Hugo
Re: Regular Expression gotchas ?
by jdporter (Paladin) on Feb 14, 2003 at 21:55 UTC
    The number one rule of learning by example is, Good Examples are Hard to Come By.

    I would recommend (gently) that you learn perl first, via some other route, then try to fix this program, because it appears to me that its flaws are several and serious, particularly with regard to error checking.

    I don't think it should ever let you get to that section 2 if $expected_signer is null; and clearly a regex match - at least, the one given - is inappropriate for determining the validity of the $signer.
    I also think it should be bailing out (via die, for example) whenever an error is encountered, rather than setting an error message and continuing on with processing.
    Also, if ".pgp" is a valid file suffix, it should make explicit allowance for it. Otherwise, it should throw an error for any invalid file suffix.

    If my estimation of the situation is inaccurate, perhaps you could post more of the code, or provide a pointer to the program, if available.

    jdporter
    The 6th Rule of Perl Club is -- There is no Rule #6.

      I think that I have a pretty good understanding of basic perl and programming theories. What I don't understand is why what appears to be identical calls with identical data is resulting in two different outcomes.

      • True, I should add .pgp as a file that will result in a .decrypt extension... or just add .decrypt on the end regardless of what the original extension is.
      • The program dies at the end with one call to an email routine and messages if there was an error. Basically the same thing, and yes a little bit more processing, nothing too bad.
      • The program is checking with a regex against a list of signed keys that we have. The validity here lies in the PGP encryption system, not the power of Perl regex.
      • My possible intent was to have the null character imply that as long as a trusted, signed key was recognized as the signature then not to worry, but specifiying a particular sender in the perl would add some error checking in the system.

      What more do you want of the code? It will be hard for me to post the entire code since it is company material, but if you have a thought process about something going onn, I will surely put up pertinent pieces. In all there is at most 100 lines that are executed.

      the_Don
      ...making offers others can't rufuse.

        I think that I have a pretty good understanding of basic perl and programming theories.
        If you think 'FILE.gpg' and 'FILE.pgp' are "identical data", then I would have to dispute your claim.

        Furthermore, do you realize that m/\Q\E/ matches everything, and thus that
        $expected_sender = ''; if ( $signer =~ /\Q$expected_sender\E/ )
        will always be true?

        It is entirely unclear why/how your section 1 and section 2 could be related. Since you "know" that they are, you have useful information which you're not sharing. Do you want us to help, or not?

        jdporter
        The 6th Rule of Perl Club is -- There is no Rule #6.