in reply to Re: Re: Re: Re: DOS directory naming in PERL?
in thread DOS directory naming in PERL?

In any quoted string ("", '', qq{}, q{}, etc.), a double backslash must be interpolated as a single backslash, because a backslash is used to escape the quoting character(s). This makes it possible to get the one character string \ and the two character string \', with '\\\'' and '\\', respectively.

I don't see what's wrong with having '\' mean \ and having '\'."'" or q<\'> mean \'. It's a little more typing, but personally if I was the creator of Perl I would sacrifice the special meaning of backslashes in single-quoted strings for more normality...oh well, I'll have to be content with a source filter or using here-docs. But...

Here-docs are the exception to this. One can always find a terminator, possibly several characters long, that doesn't appear within the string itself. <

This is something I've been thinking about. What if I'm quoting arbitrary user-supplied data in an eval()? I wouldn't want to remove any occurances of the here-doc terminator I was using -- that might upset users whose data contains my terminator. Even if the terminator is a long and rare word, there is still a chance of data/terminator clashing -- disallowing the terminator in the data seems very cargo-cult and wrong to me.

I guess the best here-doc terminator in this case would be a randomly-generated one...if only Perl had a qb// operator to quote an arbitrary number of bytes. (__DATA__ doesn't do what I want, since it is only one section. And if I want multiple sections, I need separators. The same exact problem all over again. Oh wait -- I could just store the number of bytes inside each section of the DATA. But then I have an evil arbitrary limit on how much data the user can supply. Unless I use BER encoded integers. Too much work for me.)

  • Comment on Re: Re: Re: Re: Re: DOS directory naming in PERL?

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: Re: DOS directory naming in PERL?
by chipmunk (Parson) on Jan 21, 2001 at 23:21 UTC
    As you suggest, this quoting semantic is really just a design decision.

    If you need to quote arbitrary data in a here-doc, you might try something like the following:

    my $data = get_some_arbitrary_data(); $data =~ s/^( *)$/ $1/gm; $eval = "<<'';\n$data\n\n"; $result = eval $eval; $result =~ s/^ //; print $result;
    If you specify an empty terminator, then the here-doc ends at the next empty line. Just make sure there are no empty lines in the arbitrary data; perhaps by prepending a space to every empty line. I added a space to every line consisting only of spaces, so I could convert the data back after the eval.

      The choice of making \ special inside single quotes was a design mistake. The only reason that anything needs to be special inside of single quotes is so that you can include the delimiter in the string. A much better way to do that would have been to allow two consecutive delimiters to be interpretted as one escaped delimiter. This is a common trick used a lot of places. It doesn't scale well in a lot of situations, but it would be perfect for Perl single-quoted strings.

      'This isn\'t bad' # would become 'This isn''t bad' q(Too many \)s) # would become q(Too many ))s) q"You \"need\" this" # would become q"You ""need"" this"

      The reason this would be a superior decision (and not just an alternative) is that it means that there is only one special character for each single quoted string, the delimiter. And since Perl allows balanced delimiters, the delimiter doesn't even need to be escaped much of the time.

      Perl has such a nice, rich variety of quoting schemes that most any data you want to include can be written in a clean way. The one exception is backslashes that are special in all of the quoting schemes except for "here docs". But "here docs" are cumbersome for things other than several lines of text.

      The fact that \ followed by anything other that \ or the delimiter causes the \ to be preserved in single-quoted strings helps aleviate this. It also causes confusion and bugs. People get in the habit of writing 'C:\temp\x.dat' and then don't realize why '\\host\share\dir' doesn't work and probably waste a lot of time trying to figure it out (or try 'C:\temp\').

      I don't see a good way to change this, so it is just something to live with.

      BTW, an example of how this scheme doesn't scale well is VMS. VMS's equivalent of shell script (or "batch" files) uses " to quote arguments and "" to include quotes in quoted strings. So if you want to write a script that wirtes a Makefile that passes an argument to the C compiler defining a string alias to the preprocessor, then you want the C compiler to see -Dname="val" so you have to put "-Dname-""val""" on its command line, which means you have to use """-Dname-""""val""""""" in your script. But Perl lets you use delimiters other than " and ' so this scaling problem doesn't apply in Perl. But I suspect this problem may have given this design "a bad name" and helped prevent it from being used when Perl was designed.

      Update: Perhaps we should just add a qd() form of quoting that doesn't treat \ as special and requires a doubled delimiter to include it (unbalanced) into the string.

      So, why isn't there a way to close a string early? Image the string 'Open delims are (, [, {, or <' and trying to use q(), q[], q{}, or q<> with it.

      Update 2: Ah, I've stumbled across an advantage of the current scheme above. q(This \(is\) cool) doesn't include the \s in the string because the \ escapes opening delimiters in order to affect matching. So you can have a string q(Open \(paren). Doing something similar with "my" scheme would not work. ): Note that I couldn't find any documentation on this so I resorted to using the Perl debugger to test cases.

              - tye (but my friends call me "Tye")
        tye, you may disagree with this decision, but that doesn't make it a mistake. That is why I used the word 'decision'.

        The biggest flaw with your suggestion is that changing the quoting character means scanning the string to make two changes instead of one; the previous delimiter must be made single everywhere it is double, and the new delimiter must be made double everywhere it is single. Miss fixing the original delimiter in one case, and you've changed your string, with no hint that it was ever different.

        On the other hand, with backslashing, the backslashes don't need to be removed from the old delimiter. In fact, Perl's documentation explicitly state that in a double-quoted string, a backslashed non-word character always means the literal character. You never have to worry about how the string should be interpreted, unlike your scheme where ** could mean one asterisk or two, depending on the delimiters used.

        Both approaches have advantages and disadvantages, and both are used in a lot of places. I don't think it's obvious that one solution is necessarily better than the other.