rebugger has asked for the wisdom of the Perl Monks concerning the following question:

First of all, I want to say I am novice at web development, so if I am explaining this poorly, please have some patience with me. I have been writing scripts that pull XML files from URLs using get (via use LWP::Simple qw(get)), and then I generate reports or do whatever with the XML. One thing I ran into was that get failed to retrieve URLs that had question marks in them. For example, 'http://www.example.com/FileServe?file=xml/myfile.xml'. I assume that is because it's not a direct URL, and initiates a query or something that get can't deal with? Is there a way to configure get to work with this, or is there another module I should be using?

Thanks in advance.

~rebugger~

Replies are listed 'Best First'.
Re: get ?
by kennethk (Abbot) on Apr 23, 2013 at 15:52 UTC

    I'm not entirely certain you are providing a valid URI. Without a particular example, have you tried escaping the file name, e.g.

    use URI::Escape; use LWP::Simple 'get'; my $file = uri_escape('xml/myfile.xml'); get("http://www.example.com/FileServe?file=$file");
    Have you verified the link, as typed, can be accessed via other channels, like your browser?

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      Yes, the link works in my browser, I failed to mention that. I've cleared my cache to double check I'm not looking at a cached version on my browser. I have checked that the string I am inputting into get is correct, too. None of the file names have spaces or any other characters that require escapes.

      Okay, new discovery: I can get files with question marks in the path from other websites, for example this node. Starting to think it likely has nothing to do with the URL (URI? My terminology is a bit hazy) and has everything to do with the particular website I am trying to access. Maybe it is blocking access from non-browsers. I should probably pursue this internally now, thank you for the help.

        I suspect your issue is that you have a slash in your query string, and that requires an escape. What happens when you try to grab http://www.example.com/FileServe?file=xml%2Fmyfile.xml, i.e. run the example code I gave?

        #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: get ? (LWP/HTTP/Headers)
by Anonymous Monk on Apr 24, 2013 at 12:24 UTC

    If the url works from the firefox browser (or another), but not from the LWP browser, then it must be because of prejudice (cookies, referrer, headers)

    use LWP::Simple qw' get $ua '; $ua->add_handler( "request_send", sub { shift->dump; return }); get('http://example.com/'); __END__ GET http://example.com/ User-Agent: LWP::Simple/6.00 libwww-perl/6.05 (no content) GET http://www.iana.org/domains/example/ User-Agent: LWP::Simple/6.00 libwww-perl/6.05 (no content) GET http://www.iana.org/domains/example User-Agent: LWP::Simple/6.00 libwww-perl/6.05 (no content)
Re: get ?
by Lotus1 (Vicar) on Apr 23, 2013 at 19:28 UTC
      My apologies. My brain couldn't generate a descriptive title because I was really still trying to discover what the question was. So I went for "clever" over informative.