I would like \q to be added to the Perl language as a regex operator to match any quote character, typically ' and ", but also matching typographical quotes in a Unicode environment, and possible angle-quotes as well.

I might suggest this on the p5p list, but would appreciate the feedback of the Monastery before I embarrass myself in front of the wider world.

My major motivation for this was that I use Perl primarily on Windows platforms, and I had problems specifying a quote character in a one-liner. I mist have been doing something else wrong, because it does work on both cmd.exe and command.com:

perl -ne "print if /\".+\"/"
So, the only remaining use would be as a generic quote-matching operator.

Replies are listed 'Best First'.
Re: \q quote-matching operator
by PodMaster (Abbot) on Aug 26, 2004 at 12:46 UTC
    In the meantime, use this as perl -MRegexp::Q -ne "print if /\pQ.+\pQ/" :)
    package Regexp::Q; use overload; sub import { shift; die "No argument to ${\__PACKAGE__}allowed" if @_; overload::constant 'qr' => \&convert; } sub invalid { die "/$_[0]/: invalid escape '\\$_[1]'"} use vars::i '%rules' => ( '\\' => '\\', 'pQ' => qr/['"`]/, 'PQ' => qr/[^'"`]/, ); sub convert { my $re = shift; warn "rei is $re"; $re =~ s' \\ ( \\ | [pP]Q ) ' $rules{$1} or invalid($re,$1) 'sgex +; return $re; } package main; unless(caller){ BEGIN{import Regexp::Q} print "YAY!$/" if q~PodMaster asked me "Do you like parachute pants?"~ =~ /\p +Q/; }

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

Re: \q quote-matching operator
by diotalevi (Canon) on Aug 26, 2004 at 13:11 UTC

    You can add this yourself. See perlre at the end.

    package Regexp::Quotes; use overload; sub import { shift; die "No argument allowed to Regexp::Quotes::import" if @_; overload::constant qr => \ &convert; } sub invalid { die "/$_[0]/: invalid escape '\\$_[1]'"; } my %rules = ( '\\' => "\\", 'q' => qr/[\'\" .... ]/ ); # Extend this sub convert { my $re = shift; $re =~ s( \\ ( \\ | q ) ) { $rules{$1} or invalid( $re, $1 ) }gex; $re; }
Re: \q quote-matching operator
by gellyfish (Monsignor) on Aug 26, 2004 at 12:23 UTC

    I think that it would be a little confusing to use \q as the \Q is already used for something else and the \ regex operators are generally arranged in uppercase/lowercase pairs of related meaning. I would probably suggest something in the form of a POSIX character class like [:quote:] and possibly with a unicode class like equivalent of \p{IsQuote}.

    /J\

Re: \q quote-matching operator
by TedYoung (Deacon) on Aug 26, 2004 at 12:30 UTC

    If you were trying to match quotes in text taken from a MS Office document, then that would explain the trouble you were having. By default, MS converts " and ' into "Smart Quotes".

    A similar problem for \q is it would need to be locale dependant since different languages use different quoting schemes.

    /\".+\"/

    seems to work at first, and it will work for very simple cases, but it would probably not do what you wanted for the expresssion

    ABC "DEF" GHI "JKL" MNO

    where it would match "DEF" GHI "JKL". To match "DEF" and "JKL" seperately, you would need something like:

    /\".+?\"/

    We probably should use .*? instead of .+? to handle things like "". But, if you are dealing with code, none of these solutions handles things like "AB\"C". There is a common boilerplate regex for this, but at this point you may want to take a look at the documenation for Text::Balanced which is a core module for 5.8 (and probably earlier). If provides matching solutions for all sorts of common problems like this.

    Ted

    PS: It is quite possible you already new all of this and had left it out in the interest of brevity. :-)

      Greediness is a well-known problem, and is hardly limited to quotes. That example is better written as:

      /" [^"]+ "/x;

      "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

Re: \q quote-matching operator
by dragonchild (Archbishop) on Aug 26, 2004 at 12:22 UTC
    You could use the single-quote as your -e delimiter. :-)

    But, I do like your \q idea for matching quotes. (With the \Q non-quote characterclass, as well.) Go ahead and suggest it!

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

    I shouldn't have to say this, but any code, unless otherwise stated, is untested

      \Q is used as the marker for quotemeta().

      / ... \Q$something\E ... /
      You could use the single-quote as your -e delimiter. :-)
      You can? I seem to have trouble trying to do that.
      C:\>perl -e 'print "foo\n"' Can't find string terminator "'" anywhere before EOF at -e line 1.
      In cmd.exe you can escape quotes with either a quote or a backslash, I don't know if either will work in command.com, If some one else doesn't know I'll check after I get home from work both do apear to work in command.com on Windows 98.
      C:\>perl -e "print '""'" " C:\>perl -e "print '\"'" "
Re: \q quote-matching operator
by Anonymous Monk on Aug 26, 2004 at 15:17 UTC
    You want to introduce \q because it's too hard to type \" on the command line?

    You better stack up on asbestos underwear before presenting that idea on p5p.

      The idea was inspired because I thought it was impossible to do this on the Win32 command line. Now that I know that it is entirely possible, I still think that the idea has merit for matching any quote character. However, the namespace clash with \Q makes it rather counterintuitive.
        If you want to match any quoting character, \q isn't the way to go. [:quote:] would be more appropriate. But I think that the advantage is way too small that p5p would give it any consideration (rightly so, I'd say). Of course, if there would be a Unicode property for that group of characters, you could use \p{UNICODE PROPERTY HERE}.