Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

The website I am working on takes in information and then scrapes another website within our company. Basically, a work-around but allows most of the necessary information to be automated. The problem is, the site that needs to be scraped is password protected and my site is password protected as well. Using CGI and WWW::Mechanize (or any other module) how could I take the login and password info from my site (securely) and use it for the login and pass for the site I am scraping. Right now, for demo purposes I use
my $agent = WWW::Mechanize->new(); $agent->credentials($username, $password);
where $username and $password are hard-coded Thanks in advance

Replies are listed 'Best First'.
Re: CGI Questions
by Your Mother (Archbishop) on Aug 21, 2008 at 17:46 UTC

    If you're getting the little scroll-down or pop-up window from your browser then the credentials approach, and the reading zentara gave, is the right approach. But if it's a web application controlled sign-in, where the stuff is in the (D)HTML as a form then you have to use the form submitting facilities of Mech. They're well documented in the Pod. Set some sort of cookie jar (see the Mech docs) and then you just post your username/pass (or whatever info is required to sign-in) and follow along from there.

    Security is another issue. If the stuff isn't running under https then it's not secure. If you're inside a closed network then you're semi-secure. Your stuff could still be intercepted by someone malicious who is also inside the network. Hard coding passwords is a bad practice. If you have to do it, at least get a special user account created for the task that has the bare minimum permissions/access so it won't compromise a real person's account (or employment or legal troubles for that matter). On that note, you should make sure that what you're doing is entirely kosher with your manager and security group. You might be violating your employment agreement without even knowing it and as you can hear here, even ethical hacking which is nothing but beneficial to your employer can land you in serious trouble.

    If you're serious about security, you should probably see about getting the information at the file-network level. Maybe they can mount the other machine somewhere with special user/perms for your stuff to be able to see.

Re: CGI Questions
by zentara (Cardinal) on Aug 21, 2008 at 16:44 UTC
Re: CGI Questions
by jethro (Monsignor) on Aug 21, 2008 at 18:10 UTC

    You first have to tell us what you want to secure from who or what? The password from someone getting an account on your website server or the data from someone getting a password to a website account? Do you want the script to use the same user and password for the second website irrespective of the user accessing the first one or should there be multiple users for both websites with a 1:1 relationship ?

    If you want to enter a password at the first website and from this password should be generated a password for the second site, so that you don't need to store any clear text passwords on the first website, then you can use any hash algorithm like md5. Naturally an attacker logged in as root on your website server can just change your script to print out the cleartext password to a file after it is calculated, but he won't get all passwords and he needs to wait for someone logging in

    If you just want to guard one password form other non-root users on your webserver, just make the script non-world-readable or store the password in a file only readable to the webserver-user account (often 'www-data' for apache). You could obfuscate the password a little by having your script transform it before using it, but that is a minor hindrance to anyone who aquired root or webserver-user account rights

    If you want to guard the password (not the data) of the second webserver from someone who aquired a password from your first webserver through sniffing, then adding a local fixed password to the md5 of the password should work well. Without an account on your webserver to get that fixed password the attacker has no chance to calculate the password on the second server

    Sorry, english is not my first language, this may be difficult to understand.