This meditation mentions just a few thoughts about an aspect of using Perl on MSWin that has been periodically coming up in discussions on the chatterbox. That aspect is the presence or absence of embedded spaces in file pathnames. One can pretty much assume that others will have differing opinions from what I present here, hopefully someone will find it useful all in all.

First of all, I'm going to make the most common distribution of Perl for MSWin-32bit the reference point for this presentation. ActiveState's ActivePerl has its detractors and fans, but it is the decendent of the original port of modern Perl 5 for MSWin (done by Gurusamy Sarathy who is TTBOMK still employed at AS).

Just to get the edge sharpened on this point about file pathnames with embedded spaces I'll quote verbatim here from a message embedded in the file Installer.bat that comes in the AP (ActivePerl) .zip "DARF" (Distribution ARchive File):

Looks like you are trying to install Perl into a path that contains spaces or other special characters. Though the latest Windows operating systems claim to support filenames with such special characters, many existing utilities will have trouble with such path names. Chances are that you will find this is simply too much of a bad idea to be worth it.
If this is representative of the information commonly disseminated about spaces in pathnames then it is no wonder there's widespread confusion. That's a pretty confusing message! "the latest Windows operating systems claim to support filenames with such special characters ..." means what exactly?!? That MS is lying about how their OS's work? That it's the OS's fault if something breaks in Perl or in something one tries to do with Perl, if there are embedded-space pathnames involved? I am not sure. This is probably an instance of poor-quality user documentation that ActiveState should have improved a long time ago, but leaving that point aside for a moment, it still seems reasonable to take it as a warning that relying on workability when Perl is installed to a pathname location with embedded spaces is asking for trouble down the line. It might furthermore be taken as a warning that when you are doing things with Perl on MSWin, blithely assuming that the presence of embedded spaces will never impact anything could be a foolish complacency.

The support for spaces in pathnames in MSWin isn't really debateable. What's troublesome is what happens when a shell (command interpreter) that splits arguments on spaces is presented with a pathname argument that contains embedded space. A shell, whether it is CMD or COMMAND or something more civilized that is pronounced with a name ending in the letters "sh", has no way of knowing that you meant `C:\Program' and ` Files\' were to be understood as comprising one unit. And anytime there is a .bat (or .cmd) file involved, the shell is involved, for one thing.

Here is a partial list of the .bat files that are shipped with the AP distribution:

c2ph.bat cpan.bat crc32.bat dprofpp.bat enc2xs.bat exetype.bat find2perl.bat gedi.bat GET.bat h2ph.bat h2xs.bat HEAD.bat instmodsh.bat libnetcfg.bat lwp-download.bat lwp- mirror.bat lwp-request.bat lwp-rget.bat perlbug.bat perlcc.bat perldoc.bat perlglob.bat perlivp.bat piconv.bat pl2bat.bat pl2pm.bat pod2html.bat pod2latex.bat pod2man.bat pod2text.bat pod2usage.bat podchecker.bat podselect.bat POST.bat ppm.bat ppm3-bin.bat ppm3.bat prove.bat psed.bat pstruct.bat ptar.bat ptked.bat ptksh.bat reloc_perl.bat runperl.bat s2p.bat search.bat SOAPsh.bat splain.bat stubmaker.bat tkjpeg.bat widget.bat XMLRPCsh.bat xsubpp.bat

Some might say something like "Presumably all of these scripts are written 'defensively' so that the program will do the right thing when it encounters args or data with embedded spaces, since the distribution of these scripts to MSWin systems implies conformance with the cultural expectations of MSWin users." However it is manifestly clear from the author's experience that not much can be assumed. The `make' program (whatever flavor), for example, that one can choose to install on MSWin (since it does not ship with one), is an example of an external tool that is invoked when a CPAN module is built using the `cpan' utility. There is no way that spaces in pathnames are going to be magically handled in such a case.

I'd like to ask readers who have their own memories of experiences with a breakdown in some Perl-related endeavor caused by the presence of an embedded space in an argument or input data, to please recount what and how it went wrong, below.

Thanks!


    Soren A / somian / perlspinr / Intrepid

Replies are listed 'Best First'.
Re: The Evil Embedded Space (system(@list))
by tye (Sage) on May 30, 2005 at 18:17 UTC

    Until system(@list) works in Win32 [it is currently just system("@list") which is system($scalar) which isn't what is needed for portably dealing with obnoxious file names], spaces in path names will be a big problem.

    This issue has little to do with "the shell", since the design of Win32 means that "the shell" has almost no responsibility for the splitting of command-line arguments (on spaces), which is the source of the problem. Each program must split its own command line and they certainly don't all go about this the same way.

    But things have finally settled down such that the MS Win32 C RTL's definition of how to quote arguments is fairly widely supported. This means that system(@list) can be made to mostly work, by adding quotes around arguments that would need them if parsed by a the C RTL's command-line parser (and by escaping a few things).

    Now, if one insists on using the make command, then "the shell" does become a significant source of problems and you need to invent platform-customized quoting schemes to deal with space in file names and beat people up to always use them. But that is just one problem with using make and so the best solution for this is to move away from using make in a system that tries to be portable.

    Of course, "platform-customized quoting" really should be "shell-customized" but the traditional solution here is to use the one shell that is guaranteed to be on the platform in question. So, on most platforms, /bin/sh is used even for installing things for users who prefer to use zsh or whatever. Likewise, on NT+ Win32, cmd.exe is used for the same reason. Unfortunately, this same logic means that command.com is used on pre-NT Win32, but command.com is not quite up to this task. But this, again, leads me to the clear solution of relying on Perl instead of some version of 'make' and some semblance of 'sh'.

    - tye        

      The basic Microsoft C argument processing is that you put double quotes around anything to make it a single argument. Backslash is only special if it is followed by a double quote. So \ is just \ but \" becomes a literal " character, but this is only reliable if done within double quotes. So \\" becomes a backslash followed by a close-quote. \\\" becomes backslash followed by literal quote.

      So you want something like:

      for( @args ) { if( /[\s"^*?%<>|&]/ ) { s#(\\*)"#$1$1\\"#g; $_= '"' . $_ . '"'; } }

      Perhaps with more special characters in the first match.

      Update: Oops, I was missing one backslash in my replacement. I had $1$1" when I needed $1$1\\", and we should probably avoid adding quotes if the argument is already surrounded by quotes, at least by default.

      - tye        

Re: The Evil Embedded Space
by tilly (Archbishop) on May 30, 2005 at 07:26 UTC
    Ironic.

    First you rant atcomplain about how confusing that error message is. Then you provide examples that demonstrate what it is talking about.

    Microsoft claims that spaces in filenames are supported. However many utilities including ones that you can get from Microsoft will break on spaces in filenames. And in the end you'll probably find that relying on what is technically supported (but effectively doesn't really work) to be a bad idea that is not worth it.

    Furthermore I would submit that it is unreasonable to expect utilities to work in a case where the installation program just told you that they likely won't work. If you dislike it, those utilities are mostly drawn straight from the core Perl distribution, patches are welcome.

    UPDATE: Rant was too strong a word. Complain fits better. Also I should note that I don't recall any arguments with Intrepid in chatter in recent months. Finally I should point out that I don't call a feature supported unless it both works and works with all supporting utilities that I'd expect people to use. The error message suggests to me that whoever wrote it has the same opinion that I do, and further thinks that, Microsoft's claims notwithstanding, spaces in filenames aren't really supported.

      In my understanding, Intrepid was rantingcomplaining about the quality of the warning message, which I find correct: the error message should be changed.

      I'm no Microsoft advocate, but in this case I'd say that the problem isn't in the OS, but in the utilities. If something is technically supported, but used in the wrong manner, I wouldn't say that it doesn't really work. As PodMaster correctly points out, an application invoking the shell should quote its arguments to get them understood well; so, it's possible to use the feature without breaking anything.

      This doesn't happen for ActiveState Perl, in which case the warning message should read "We have more interesting things to do other than fixing that spaces-in-paths issue in all stuff, so don't put spaces in your installation paths and you'll live happy".

      Flavio (perl -e 'print(scalar(reverse("\nti.xittelop\@oivalf")))')

      Don't fool yourself.

        I'm no MS advocate either, but I don't really buy the argument that this is not in the OS.

        First off, spaces in paths have been available for years. I remember using them on DOS 5 and OS/2 2.1. Unix/Linux has always had the ability to put spaces in the path as well. So Windows isn't doing anything new here. What is new here is the advocacy to use spaces.

        I look at this as I would any other design that will inherently cause users (in the case of an OS, this includes not just end-users, but developers on that OS) problems. Funny thing is, they could have avoided this by officially endorsing spaces, but using a directory such as C:\Programs instead of C:\Program Files as the directory for default installations. 99% of the problems with spaces would have gone away. Instead, they consciously chose to break everyone in order to enforce their idea that spaces should be allowed. What this means in practical use is that CMD.EXE is deprecated, as is any other non-GUI method of launching applications.

        It's a design decision by MS that causes this pain. It means everyone else, especially those writing batch files, needs to change their code. It breaks backwards compatability, which I believe they did on purpose. And thus I'd say the problem is in the OS.

        Unfortunately, it is what it is, and everyone who develops for Windows now (well, for the last 10 years now) must eat the cost of the switch. The market (generally) demands it.

        I know that, at work, we were still eating the cost of conversion in 1998. And even today, there is a small overhead of having to constantly make sure we quote everything properly in our batch files. What a headache. Our build tools explicitly say "no spaces in their paths" for building. What we ship to end-users must support spaces, but it's too expensive for us to support the spaces internally. Yet another Microsoft-tax - this time on development.

        Update: Some may wonder what a rant (yes, a rant ;->) on MS is doing here. It's not. It's a rant on design issues, and how they permeate to everyone that relies on your code and design. Care needs to always be taken that you break your users only when there is no other way to get them the behaviour desired, and I think that simply changing the default location for new software just a tiny bit would have reduced that break measurably. I have no issues with spaces in filenames per se (as I said, I've been doing this for years prior to Win95's arrival). Just the design of how to implement it.

      Well, error messages that literally say "latest versions" irk me. I have no way to judge what that means. Is it Windows ME, which I don't have to worry about, or Windows 2000, which is old, or Windows XP, which I have now? It's the same annoyance I have with technical support web pages that don't have dates. How do I judge the information if I don't know how fresh it is?

      But, just because I think the message is vague doesn't mean I don't understand the problem. Other people, however, may not.

      --
      brian d foy <brian@stonehenge.com>
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: The Evil Embedded Space
by PodMaster (Abbot) on May 30, 2005 at 08:56 UTC
    IIRC, there is only one concern with regards to spaces, and it is the shell (cmd/command/whatever). Various modules and programs which invoke the shell don't take care to properly quote paths with spaces.

    Now some of versions of the shell will correctly guess in some instances that by C:\program files\perl\bin\perl.exe blah blah you really mean "C:\program files\perl\bin\perl.exe" blah blah, but it mostly won't work right.

    So, simply quote your paths when you have spaces.

    I haven't thought about it in a looong while, but ExtUtils::MakeMaker doesn't take care to quote any paths to executables (AR/CC/LINK), especially the path to perl (which is absolute), but now that I have, here's a patch that seems to work. Just add the following sub to ExtUtils::MM_Win32

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

Re: The Evil Embedded Space
by graff (Chancellor) on May 30, 2005 at 09:55 UTC
    Despite your intent to focus on just embedded spaces in file and path names, I'm compelled to point out that the problem is not just with spaces.

    Microsoft also "supports" (in the sense of making it easy for various windows-based GUI tools to use) characters like ampersand, exclamation mark and semicolon. I'm not sure (I'm not a windows user), but it wouldn't surprise me if people could put asterisk, question mark, vertical bar and angle-bracket characters in file or directory names, as well. And every now and then someone can manage to get an oddball control character into a file name.

    That kind of stuff can cause real havoc for unsuspecting shell-based operations. Luckily, most modern unix-like shells that have been ported to windows have the feature of automatically inserting back-slash escapes for the nasty characters when doing tab-completion of file names.

    But that only applies to interactive shell usage, not sub-shells invoked by "make", etc. The multi-arg usage on system and pipeline open calls makes things somewhat easier when you get to the point of writing perl scripts, but as for handling installation of Perl (and of various CPAN modules), it can be risky business if you have funny characters in path names.

    If PodMaster's patch works for things other than space, then maybe the issue is solved. Personally, I'd follow the advice as quoted in the OP: avoid putting perl in a path that is likely to cause a lot of grief for shell usage. There's just no good reason to ask for that kind of trouble.

      I'm not sure (I'm not a windows user), but it wouldn't surprise me if people could put asterisk, question mark, vertical bar and angle-bracket characters in file or directory names, as well.
      Actually you can't use any of the following characters in a Windows filename:
      \ / : * ? " < > |
      (using the Windows Explorer interface, anyway)

      From the command line, Windows ignores special characters in a file name for some operations, and provides a strange error otherwise. For example, if I use the copy con foo*bar command, I'll end up with a file named foobar. But if I try using the ren foobar ba*z I get a filename with a seemingly random alphanumeric value. In my first trial run I got a file named "baoz", the second time it was "baqz". Windows provides an error anytime I use a : in the filename ("A duplicate file exists, or the file cannot be found" even in an otherwise empty directory).

      All in all I wouldn't be surprised to see oddball characters in any file name, no matter which OS you're using. But it seems some OSs are more prone than others.

        The : character is an indicator that you have an alternate data stream in that file. The : has special meaning, therefore it exhibits odd behavior when used incorrectly.

        --MidLifeXis

Re: The Evil Embedded Space
by Mr. Muskrat (Canon) on Jun 04, 2005 at 12:47 UTC

    You've peaked my curiosity. Why the heck are you trying to install the AS package when ActiveState clearly states on the download page that it "... is recommended only if you are unable to install ActivePerl using the MSI installer"?

      If you want to have multiple versions of ActivePerl installed, that's the only way.
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: The Evil Embedded Space
by jonadab (Parson) on Jun 11, 2005 at 02:46 UTC
    The support for spaces in pathnames in MSWin isn't really debateable.

    Oh yes, it is very much debatable. Perl is *FAR* from being the only thing that has trouble with pathnames that include spaces (and other bizzarroid characters that clearly don't belong in pathnames). Many third-party GUI applications get this wrong, because support for it was not retrofitted properly into all of the parts of the API; only the *new* parts of the API (err, new in 1995) can be relied upon to support them correctly, and even then there are a number of gotchas. As of Windows 98 SE, even Windows Explorer did not handle this correctly in some of the wackier edge cases, particularly having to do with associations and drag-and-drop actions. (I haven't tested this in Windows XP, mainly because these days I don't use Windows enough to have run across it.)

    I agree that the error message should probably be more clear and ideally should list specific things that might break. But in essense it is right: installing something as complex and command-line-oriented as Perl into a path that contains spaces is not really a very good idea.


    "In adjectives, with the addition of inflectional endings, a changeable long vowel (Qamets or Tsere) in an open, propretonic syllable will reduce to Vocal Shewa. This type of change occurs when the open, pretonic syllable of the masculine singular adjective becomes propretonic with the addition of inflectional endings."  — Pratico & Van Pelt, BBHG, p68