Good day Monks. I am trying to get stories from Reuters that come over its RSS feed. Some of these stories are multi-part so to get the whole thing it's necessary to follow the "Next" link at the bottom. Alas, this is a #!&%* javascript link which WWW::Mechanize can't follow.

So I'm trying to do it with Win32::IE::Mechanize which can supposedly follow those links. When I point it at, for example, perlmonks.com, it works fine, but when I point it at one of the URLS from the Retuers feed it doesn't:

use strict; use Win32::IE::Mechanize; my $iemech = Win32::IE::Mechanize->new( visible => 1); $iemech->get('http://feeds.reuters.com/~r/reuters/topNews/~3/84952673/ +newsarticle.aspx'); my $html = $iemech->content; print $html;
produces the html
<HTML><HEAD><LINK href="http://i.today.reuters.com/media/styles/rcom-a +rticle.css" type=text/css rel=stylesheet><LINK href="http://i.today.r +euters.com/media/styles/rcom-master.css" type=text/css rel=stylesheet +> <SCRIPT language=javascript src="http://i.today.reuters.com/News/scrip +t/links.js" type=text/javascript></SCRIPT> </HEAD></HTML>
which ain't anywhere close to the html for what's showing in the IE window.

One thing I notice is that there is a redirect happening. But unlike WWW:Mechanize, Win32::IE::Mechanize seems not to store the content in its object but (I guess) gets it from the browser DOM. So it seems like the content method should return whatever is showing in the browser. But as you will see if you try the code, it doesn't.

Anyone know if there's a fix for this?

TIA...

Steve


In reply to Win32::IE::Mechanize not getting correct content by cormanaz

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.