jonjacobmoon has asked for the wisdom of the Perl Monks concerning the following question:

This may not really be a Perl question, in fact I suspect this related to a non-standard Apache configuration from one unusal site, but when I use the code below with the URL: http://www.nellaware.com/glujodat.html I get a 404. When I browse to it with any standard browser I get a page from a redirect. So, why doesn't my code, which has worked on thousands of addresses some of which have redirects, work here?
#!/usr/bin/perl -w use strict; use LWP::UserAgent; my $url = shift || die "No url supplied\n"; my $agent = new LWP::UserAgent; my $request = new HTTP::Request 'GET' => $url; my $result = $agent->request($request); if ($result->is_success) { print $result->as_string; } else { print "Error: " . $result->status_line . "\n"; }<p> <hr>
I admit it, I am Paco.

Replies are listed 'Best First'.
Re: Why don't I get a redirect
by dws (Chancellor) on Sep 20, 2002 at 17:07 UTC
    I use the code below ... I get a 404. When I browse to it with any standard browser I get a page from a redirect.

    The site might be sensitive to the user agent string. Try adding   $agent->agent("Mozilla/4.0 (compatible; MSIE 5.01; Windows 98"); before you fetch the page.

      As a follow up, you mention the agent string, is there someplace that explains this string in some detail?
      I just want to understand what is important in that line to different servers.
        As a follow up, you mention the agent string, is there someplace that explains this string in some detail? I just want to understand what is important in that line to different servers.

        It's a messy subject. Browsers announce themselves using a "User-Agent:" header in the HTTP request. This is visible to CGIs in the HTTP_USER_AGENT environment variable.

        As browsers have developed, they've added capabilities (and have introduced and fixed bugs). There are incompatabilities in things like CSS (Cascading Style Sheet) support between different browsers and browser versions, and even some subtle differences Javascript support.

        Some sites use the user-agent string to ensure that they emit the right DHTML/JavaScript from the server side. Also, various JavaScript libraries will include "browser detection" code to run on the client side, so that they can do the right magic stuff (or avoid trying to do the wrong magic stuff) in a browser. A limited number of sites use the user-agent string to be jerks, denying service to one browser type or another "just because."

        What the user agent strings are and how to interpret them is scattered throughout the available literature. Try googling on "Javascript browser detect".

Re: Why don't I get a redirect
by swiftone (Curate) on Sep 20, 2002 at 17:31 UTC
    Odd. I telnetted to port 80 on said box, sent "GET /glujodat.html" and got the following reponse (HTML removed:)

    Could not determine the website that you were looking for.

    Why did this happen?

    This website, like many others, makes use of advanced features of modern web browsers. All standard web browsers released since June 1996 support these features. This list includes Microsoft Internet Explorer 3.0 and above, Netscape Navigator 2.0 and above, Opera, Lynx, Web TV, and others. You may want to install the latest version of your favorite web browser to take advantage of these advanced features. Until you upgrade your web browser, select the link below to access this website

    Please select this link to continue

    The link in question is /_dnscentral_scripts/findhost

    I can only assume the page redirects you to that script, which does not exist.

    Update: Two bits of info. Twiddling the Agent (as suggested by dws) didn't seem to make a difference in this case (though it's good general advice). And the error I got also mentioned: DNS Code: HTTP_HOST not available . I'm not sure if that's relevant, since HTTP_HOST is a server-side heading (I thought), but maybe it's useful.

      Odd. I telnetted to port 80 on said box, sent "GET /glujodat.html" ...

      If that box supports multiple virtual sites, you'll need to issue a complete HTTP/1.1 request, naming the site you're trying to fetch the page from.

      GET /glujodat.html HTTP/1.1 Host: www.nellware.com
      should suffice.

        Yes, specificing the agent as suggested does not work.
        And, telnet to the box as suggest above only returns a customized 404 page. I suspect that there is some sort of DNS forwarding going on, but I am not sure how it works in this case.


        I admit it, I am Paco.
Re: Why don't I get a redirect
by Aristotle (Chancellor) on Sep 21, 2002 at 09:34 UTC

    Telnet there as per dws' suggestion and look at the source:

    <meta name="generator" content="DNSCentral.com Forwarding Services (naboo)">

    and later

    <frameset rows="100%,*">
    <frame src="http://www.mindspring.com/~nellaware/glujodat.html" name="dnscentral_fwd" scrolling="auto" frameborder="0" border="0" noresize>
    <noframes>
    <h2>NELLA_WARE - Writing Software For You!</h2>
    <p><a href="http://www.mindspring.com/~nellaware/glujodat.html">Please click to visit nellaware.com</a>.</p>
    </noframes>
    </frameset>

    That's no DNS forwarding, the 404 page just contains a frameset that will cause a frame-capable browser to load the page from the real server inside a frame that fills the entire window - a standard URL "redirector" service trick.

    You simply want to go to http://www.mindspring.com/~nellaware/glujodat.html.

    Makeshifts last the longest.

      That is all well and good, but the question remains. How come I don't get a redirect with the original code. Yes, I can do all that by hand, but that does not help me.

      So, what is it about my code that fails to do the redirect or is there something on the server that purposely does not give me the redirect? The telnet stuff is interesting for testing but is not ultimately the way I want to do it.



      I admit it, I am Paco.
        You didn't understand. There is no redirect. The "redirect" consists of loading the target page in a window-filling frameset. As such, it has nothing to do with HTTP and consequently isn't handled by the LWP modules.

        Makeshifts last the longest.