Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Download a remote file via https within a session

by c (Hermit)
on Sep 04, 2001 at 23:20 UTC ( [id://110131]=perlquestion: print w/replies, xml ) Need Help??

c has asked for the wisdom of the Perl Monks concerning the following question:

brothers- my brain hurts. i am sorry that i do not have code to post for approval, but i've not yet made it that far without already stumbling.
i need to retrieve a document from a remote machine through https. my first idea was to look over libwww and see if it fit the bill. the wildcards in the scenario are as follows. the remote file doesnt actually exist. rather its built when a client selects a hyperlink stating "download list as csv file". to boot, access to the page containing this url is username/password protected and after login creates a session id as evidenced in the url which contains a sid=xxxxxxxxxxx
i see that libwww allows for a get(url) however, i am not certain how to attack something that contains a sid or how the login process should be handled. the login is not done via apache htaccess, rather the fields of a form are seemingly handed to a C script or some other language (not perl).
my apologies for the ambiguity, however this really is the gist of my issue and contains all the information that i am able to relate. libwww is brand new to me, LWP is also just as much black magic. any suggestions or road maps on where to start are appreciated.

humbly -c

Replies are listed 'Best First'.
Re: Download a remote file via https within a session
by wog (Curate) on Sep 05, 2001 at 00:10 UTC
    To handle submitting forms (such as the login form), you will probably find the HTML::Form module distrubuted with libwww very handy -- it was parse all the HTML and extract the form for you, and provide an easy interface for acting as if you filled in values and clicked the submit button. And for following links with specific text, you will probably need to use HTML::TokeParser and/or HTML::Parser to find their URL.

    Note that you will probably need to use LWP::UserAgent and not LWP::Simple since you have to handle forms. (It's not that hard -- makes it easier, too, if you want to add cookie support in the future.)

Re: Download a remote file via https within a session
by perrin (Chancellor) on Sep 05, 2001 at 07:01 UTC
    Let's take this one step at a time. You need a web client that handles HTTPS. Either LWP or HTTP::GHTTP will work. (And by the way, LWP *is* libwww.) Then you need to get at a page that requires a login. This is also no problem. Just make your program go to the login page, get the URL it returns to you, and follow it. Basically, you follow the same steps you would in a browser but you do it in your program.

    You need to make an attempt to learn one of these HTTP client modules. If you get stuck, you can ask questions here. Just make sure they're specific, and not "write my program for me" kind of questions and you'll get plenty of help.

Re: Download a remote file via https within a session
by mitd (Curate) on Sep 05, 2001 at 01:28 UTC

    One point to watch is that because SID is passed via a URL it is likely returned to you as a redirect.

    This is where it gets a little tricky. If the LWP::UserAgent->request method it receives a REDIRECT response it calls LWP::UserAgent->redirect_ok which returns FALSE for a POST request and true for all others. If LWP::UserAgent->request() thinks it is a POST request it will never go back(redirect) and get your target page.

    I have found the wget tool (found on most Linux distro's) to be very helpful in figuring out these kind of problems

    mitd-Made in the Dark
    'My favourite colour appears to be grey.'

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://110131]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-04-16 16:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found