in reply to Re: Windows command line
in thread Windows command line

My understanding is that this behaviour is undocumented. Did you find this out by experimentation or did you manage to unearth something from MSN that nobody else I know of has found?

Incidentally, I have a patch for the README.win32 (which is automatically converted to perlwin32.pod when perl is built) that expands on the explanation of quoting rules (that I figured out by experimentation), perhaps you and I should knock together a better patch and get it included? The doc currently mentions ^ but doesn't explain it as well as you did (IMO).


---
demerphq

    First they ignore you, then they laugh at you, then they fight you, then you win.
    -- Gandhi


Replies are listed 'Best First'.
Re: Re: Re: Windows command line
by BrowserUk (Patriarch) on Oct 18, 2003 at 18:37 UTC

    Essentially, it is the result of experimentation and as such subject to the whims of change by MS. It has been consistant (if ecclectic) for a good while now though.

    If you want to add that to your patch I'm all for it.

    The harder one is working out the treatment of double quotes...

    P:\test>perl -le"print qq[($_)] for @ARGV" "fred"""" """""bill" (fred") ("bill) P:\test>perl -le"print qq[($_)] for @ARGV" "fred" "bill" (fred) (bill) P:\test>perl -le"print qq[($_)] for @ARGV" "fred"" "bill" (fred") (bill) P:\test>perl -le"print qq[($_)] for @ARGV" "fred"" ""bill" (fred") (bill) P:\test>perl -le"print qq[($_)] for @ARGV" "fred"" """bill" (fred") ("bill) P:\test>perl -le"print qq[($_)] for @ARGV" "fred""" """bill" (fred" "bill) P:\test>perl -le"print qq[($_)] for @ARGV" "fred""a" ""a"bill" (fred"a "abill) P:\test>perl -le"print qq[($_)] for @ARGV" "fred"a"" "a""bill" (freda) (a"bill)

    If you can discern a pattern in that lot, I've several hundred more weird examples for analysis:)


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!

      If you can discern a pattern in that lot, I've several hundred more weird examples for analysis:)

      Well, Im not going to claim that I fully grok whats going on. I doubt if even Microsoft will do that. But I do have some ideas.

      It seems that the parser has at least four modes. Unquoted text, quoted text, and semi quoted seperable text, and semi quoted inseparable text. Semi quoted seperable text appears to be like quoted except that spaces are signifigant and special chars like pipe '|' are treated as literals and do not have their normal effect. Semi quoted inseparable text is where spaces are treated as a literal and do not end the argument but special chars are signifigant. Normal quoted text can be viewed as being a combination of the two semi quoted modes.

      In addition to these modes there are some special rules for escaping the quotes. A backslash can be used to escape a quote in all of the modes, however a pair of quotes can be used to escape a quote, but that doing so puts the parser into "semi quoted seperable" mode. A new quote when in this mode puts the parser into "semi quoted inseperable" mode, which if terminated with a lone quote returns to SQS and if terminated with a pair returns to unquoted mode. This behaviour can be observed by observing the behaviour of special char handling (the pipe is useful as it doesnt create weird files, instead gives an error when the pipe is treated as special).

      I put your snippet into a script for legibility purposes, here are some results:

      # quoted D:\>pq "|" (|) # quoted ends in pair, which is treated as a literal " and a mode shif +t D:\>pq "|"" (|") # ... which is observable here as the pipe is treated as a literl, thi +s is # semi-quoted-seperable mode. D:\>pq "|"" | (|") (|) # here we see that the lone quoted section in the second argument is # semit-quoted-inseperable. the space is literal but ... D:\>pq "|"" " " (|") ( ) # ... the special chars like pipe are not. D:\>pq "|"" " | The syntax of the command is incorrect. # here we see that SQS reverts to SQI after the second lone quote. D:\>pq "|"" " " |" (|") ( ) (|) # but it reverts to unquoted after a second double quote... D:\>pq "|"" " "" |" '"' is not recognized as an internal or external command, operable program or batch file. # ... which this demonstrates D:\>pq "|"" " "" foo" (|") ( ") (foo)

      I think that these rules are sufficient to explain all of your examples. However I will say that its really sucky that MS doesnt fix this mess. :-)


      ---
      demerphq

        First they ignore you, then they laugh at you, then they fight you, then you win.
        -- Gandhi


        Well if MS fixes up this mess, they probably mess up something which is fixed. ;-)

        CountZero

        "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      The documented way that double quotes work (on a Microsoft command-line to a C program) is that inside of double quotes, special characters aren't special except that \" becomes ", \\\" becomes \", and \\" becomes \ followed by the closing " (etc.) -- sequences of \s not immediately followed by a " are not changed.

      It appears to me that Microsoft made some stab at also supporting "" inside of double quotes becoming " but that they have a few bugs in this code. Since I didn't find this behavior documented, I just avoid it and use \" when I want a " inside of double quotes and things are well behaved (as far as I've noticed) -- though the scheme leaves much to be desired.

                      - tye

        Agreed backslashing embedded double quotes is the easiest way of dealing with things on the command line. However, I'd still like to get a grip on exactly how double-quotes are parsed by CMD.EXE.

        One reason is that if you ever have to try and pass a filename that contains spaces to an external program via cmd.exe, especialy if it is one that won't accept forward slashes as path separators, the mess of backslashed backslashes escaping embedded quotes is just so damned messy to get right.

        I'm sure that there is some logic to it, even if it is twisted logic. I think the source of the messiness related to the fact that you can quote individual parts of a complete path as well as a whole path.

        P:\test>dir d:\"Program Files"\"Apache Group"\* Volume in drive D is Winnt Volume Serial Number is D822-5AE5 Directory of d:\Program Files\Apache Group 31/05/02 17:38 <DIR> . 31/05/02 17:38 <DIR> .. 31/05/02 17:38 <DIR> Apache2 3 File(s) 0 bytes 947,634,176 bytes free

        However, that's notth complete story as if you try and pass this into a c program like perl

        P:\test>perl pq.pl8 d:\"Program Files"\"Textpad 4"\* (d:"Program) (Files"Textpad 4\*)

        You get an almighty mess. Add an extra set of quotes and you get

        P:\test>perl pq.pl8 "d:\"Program Files"\"Textpad 4"\*" (d:"Program Files"Textpad) (4\*)

        TO achieve the desired result you have to escape the backslashes

        P:\test>perl pq.pl8 d:\\"Program Files"\\"Textpad 4"\* (d:\Program Files\Textpad 4\*)

        I can't help but think that using backslash as an escape character on a system that has programs that require the backslash be used as the path separator is a terminally (sic) bad idea.

        Oh well. Our's is not to reason why......


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        Hooray!