rsilvergun has asked for the wisdom of the Perl Monks concerning the following question:

Wise ones, I abase myself.

I may be pushing my luck with this one, but here's what I'm trying to do:

I have an intranet page I do work on. I want to use OLE to link to an open page & pull the HTML/TEXT for use elsewhere. I hope to use the existing window so I'm not cluttering my screen with new instances. The URL is programatically generated gibberish so I can't use GET calls to pull the data down.

To complicate things the only libraries I have available are the ones that come with Active State Perl (5.8.0 build at least).

I found this thread, which let me get a handle to the document, but I can't find any way to get at the HTML or Text. I've had a little success prototyping a solution in VBA (blech), but no luck with perl. Has anyone done this before?

Replies are listed 'Best First'.
Re: Scrape an existing IE Window?
by pc88mxer (Vicar) on Jun 06, 2008 at 14:51 UTC
      I can get the DOM object easily enough, but I don't know what to do with it once I've got it. Can I read the page's HTML from the DOM Object? I've looked though Microsoft's documentation here, but I couldn't find any way to get at the html...
Re: Scrape an existing IE Window?
by rsilvergun (Acolyte) on Jun 09, 2008 at 14:37 UTC
    I've figured out a few things: This code:
    use strict; use Win32::OLE; my $sh = Win32::OLE->new('Shell.Application'); print "Count is $sh->{Windows}->{Count}\n"; for (my $i = 0; $i < $sh->{Windows}->{Count}; $i++) { my $win = $sh->{Windows}->Item($i); print "InnerHTML '$win->{Document}->{body}->{innerHTML}'\n"; print "OuterHTML '$win->{Document}->{body}->{outerHTML}'\n"; print "InnerText '$win->{Document}->{body}->{InnerText}'\n"; print "OuterText '$win->{Document}->{body}->{outerText}'\n"; }
    will pull quite a bit of HTML. Unfortunately the Text I want isn't enclosed in the <body> tag, so it misses what I want. I tried this:
    use strict; use Win32::OLE; my $sh = Win32::OLE->new('Shell.Application'); print "Count is $sh->{Windows}->{Count}\n"; for (my $i = 0; $i < $sh->{Windows}->{Count}; $i++) { my $win = $sh->{Windows}->Item($i); my @list = Win32::OLE::Enum->All($win->Document->all); print "My Enum "; print "@list"; }
    Hoping that I could enumerate the ALL collection, but it just gets me more Win32::OLE hashes, that I don't know what to do with. I had hoped each one would be a reference to an element on the page, and I could reconstruct the page source from it, but once again I've hit a wall...