Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

I need to automate/scrape data from IE

by CarlosT (Initiate)
on Dec 07, 2011 at 00:37 UTC ( [id://942141]=perlquestion: print w/replies, xml ) Need Help??

CarlosT has asked for the wisdom of the Perl Monks concerning the following question:

I've got a task that is just screaming for automation. Every week, I have to get a number for each of 36 entities for some metrics I do and that basically consists of counting the 'Y's in a certain column in a table on a company web page. Each entity requires picking a value in a dropdown, refreshing the page, and counting 'Y's. It's a slow, cumbersome, tedious, and vulnerable to error process. What I'd love is to point perl at the site and get back the numbers quickly and cleanly.

Here's what I do know (I don't know what matters):
  • The site uses kerberos for authentication
  • The site uses SSL
  • the page only works reliably in Internet Explorer
I have no previous experience with web automation, so I'm flying fairly blind. I tried using LWP, but couldn't connect because of SSL issues. I then gave up on perl for a while and tried using greasemonkey, but that was when I discovered that the page didn't actually work with Firefox. So most recently I've been trying to use Win32-IEAutomation, but haven't been able to get that off the ground either. This is what I currently have:
#!/usr/local/bin/perl use Win32::IEAutomation; # Create new instance of IE my $ie =- Win32::IEAutomation->new ( visible => 1, maximize => 1); my $url = 'https://internal.site.of.doom/'; $ie->gotoURL($url);
That gets me a blank IE window and an error message reading "Could not start AutoItX3 Control through OLE"

Anyone have any ideas?

Thanks,

Carlos

Replies are listed 'Best First'.
Re: I need to automate/scrape data from IE
by grantm (Parson) on Dec 07, 2011 at 01:31 UTC

    If the page only works with IE then there's a chance that it uses ActiveX - the core of the HTML page would be an <object> tag with a bunch of ugly parameters. If that is what you're getting then one or more of the parameters might be URLs that you could try accessing directly. But if it does use ActiveX and you can't access the data URLs directly then you're pretty much screwed.

    Is this for your TPS reports?

      i have launched IE with URL using IEAutomation. now i need to navigate to test box , i am using getTextBox method but getting error , no text box present with specificed option name (as well i can see focus is in cmd prompt it doesent goes to IE) anyone is hainvg any idea about it.
Re: I need to automate/scrape data from IE
by Corion (Patriarch) on Dec 07, 2011 at 09:21 UTC
Re: I need to automate/scrape data from IE
by hawtin (Prior) on Dec 07, 2011 at 09:00 UTC

    The message you are getting back suggests that just using OLE won't work, however it is worth trying the simplest approach (just to prove that it won't do it).

    use strict; use Win32::OLE; my $ie = Win32::OLE->new( 'InternetExplorer.Application' ) or die "error starting IE"; $ie->{visible} = 1; $ie->navigate( 'https://internal.site.of.doom/' ); sleep(4); if(!defined $ie->Document()) { print STDERR "Nope that failed as well"); } else { print "We have something back!\n"; }
      This code worked. It opened a browser window to the correct url. Can I do what I need to do just by using OLE?
Re: I need to automate/scrape data from IE
by Anonymous Monk on Dec 07, 2011 at 02:35 UTC
Re: I need to automate/scrape data from IE
by JavaFan (Canon) on Dec 07, 2011 at 07:20 UTC
    I tried using LWP, but couldn't connect because of SSL issues.
    Can you be a bit more specific? LWP ought to be able to process https requests.

    Instead of screen scraping, you could also try to find out where the page gets its data from, and just go straight to the source.

      That would be my first choice as well, but I don't have access to that.
Re: I need to automate/scrape data from IE
by patcat88 (Deacon) on Dec 07, 2011 at 10:22 UTC
    Wireshark then LWP? I know SSL is a pain. There are ways to make SSL systems use "your key"/your cert instead of a random key to talk to the server and then your can decrypt the captured traffic.
Re: I need to automate/scrape data from IE
by Anonymous Monk on Jun 16, 2015 at 19:23 UTC

    Dear monk, to get rid of that error you need to install AutoIt application. Either a licensed ware or a free version. Probably you can get it from: (https://www.autoitscript.com/site/autoit/downloads/) Now if you are interested in a portable version, probably you can get it from: (https://softwarespot.wordpress.com/code/autoit-portable/) and its script editor from: (https://www.autoitscript.com/site/autoit-script-editor/) Unless you install AutoIt application the .dll files will not be registered and the code that uses Win32::IEAutomation might not work. Its better to go for normal install than to go for portable version as the .dll files need to be registered with OS. Once you install the AutoIt application, I am very much sure that Win32::IEAutomation will work flawlessly with IExplorer 8 and below. Click method wouldn't work if IExplorer version is 11. Didn't try using it on IExplorer versions 9 and 10

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://942141]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (4)
As of 2024-03-28 21:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found