perlboer has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perlmonks,
My question is related to Re^4: fetch url

This works :
print redirect( "http://www.xs4all.nl/~werksman/webmees/index.html" );
This works also:
$ARGV[0] = http://www.perlmonks.org<br> print redirect( $ARGV[0] );
This doesn't work:
$ARGV[0] = http://www.xs4all.nl/~werksman/webmees/index.html print redirect( $ARGV[0] );
I think the ~ (tilde sign) of "$ARGV[0]" will be interpolated.
How can I prevent it ?
Thanks in advance ?
The Perlboer

Replies are listed 'Best First'.
Re: http interpolation
by gellyfish (Monsignor) on Jul 01, 2005 at 13:55 UTC

    By $ARGV[0] I suppose you mean that you are passing this on the command line of the program - it is the shell that is doing the metachar substitution before it gets to the Perl program. You need to escape the tilde in the command line or put the argument in single quotes.

    /J\

      yes, i would have to agree with gellyfish and escape the character on the command (or single quotes) as it looks like problem is before it gets to the script, not after...

      just some re-enforcement...

      magnus
        Gellyfish, Magnus and Duff,

        I think I saw the light.
        After these instructions I will work out your tips.
        Thank you.
        I will log when it works.

        Nice weekend,
        Perlboer
      Gellyfish,
      Examine this case, I think I need to parse the $ARGV[0] before the "print redirect"
      That's what you mean ?
      Perlboer

        No what I mean is that it is not perl that is doing the substitution of the '~' it is the shell, if you print $ARGV[0] as the first thing in your program you will see what is happening. You need to call your program like

        program.pl 'http://www.xs4all.nl/~werksman/webmees/index.html'
        or
        program.pl http://www.xs4all.nl/\~werksman/webmees/index.html
        to prevent the '~' from being substituted for the path to your home directory.

        /J\

Re: http interpolation
by duff (Parson) on Jul 01, 2005 at 13:38 UTC

    If the tilde is being "interpolated" somehow it's the redirect() subroutine that's doing it and so your first "this works" example would have the same behavior as the "doesn't work" example.

    In any case ... where are your quotes around the URL for the second two examples? What's with the stray <br>s? Are you using warnings and stricture?

      Duff,
      the br was a typo, i updated the code.
      I use strict.
      My scripts works like:
      <a href="my.cgi?gotoSite=1">CVS</a>
      Hereby the whole code, sorry if I was too reserved with information, I thought "small question":
      use strict; use CGI qw/:standard/; my $debug = 0; # -|Main|------------------------------------------------------------- +--------- # Pre : $ARGV[0] from "script.pl # Post: # -------------------------------------------------------------------- +--------- print redirect( $ARGV[0]);
      Kind regards,
      Perlboer
        You seem confused. This is a CGI script, since you are using CGI.pm, but you are attempting to access @ARGV, which probably isn't set by your CGI server. Maybe you should try "param" or something.
Re: http interpolation
by merlyn (Sage) on Jul 01, 2005 at 14:57 UTC
      You sure about that? :)
      2.3. Unreserved Characters

      Data characters that are allowed in a URI but do not have a reserved purpose are called unreserved. These include upper and lower case letters, decimal digits, and a limited set of punctuation marks and symbols.

      unreserved = alphanum | mark

      mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

      Unreserved characters can be escaped without changing the semantics of the URI, but this should not be done unless the URI is being used in a context that does not allow the unescaped character to appear.

      Taken from http://www.faqs.org/rfcs/rfc2396.html.

        I call your RFC2396 with RFC1738, for those people who want to be backwards compatable:

        (from section 2.2)

        Unsafe:

        Characters can be unsafe for a number of reasons. The space character is unsafe because significant spaces may disappear and insignificant spaces may be introduced when URLs are transcribed or typeset or subjected to the treatment of word-processing programs. The characters "<" and ">" are unsafe because they are used as the delimiters around URLs in free text; the quote mark (""") is used to delimit URLs in some systems. The character "#" is unsafe and should always be encoded because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it. The character "%" is unsafe because it is used for encodings of other characters. Other characters are unsafe because gateways and other transport agents are known to sometimes modify such characters. These characters are "{", "}", "|", "\", "^", "~", "[", "]", and "`".

        All unsafe characters must always be encoded within a URL. For example, the character "#" must be encoded within URLs even in systems that do not normally deal with fragment or anchor identifiers, so that if the URL is copied into another system that does use them, it will not be necessary to change the URL encoding.

        Now, I know, you're going to quote RFC2396 again (section G.2):

        The tilde "~" character was added to those in the "unreserved" set, since it is extensively used on the Internet in spite of the difficulty to transcribe it with some keyboards.

        But, in the past, it was a reserved character.