I'm trying to work out a strategy, but I'm floundering because I don't know which modules I should be looking at. I want to compare information on the same subject from a number of web sites. Some of these are javascripted, so I was intending to try to automate a browser to get to the right pages. In the initial stages at least, I was expecting to open the browser with one tab per site, navigate to the page manually (I'd like to automate that in due course) and then extract the stuff that interests me from the pages. I will want to refresh the pages at various times as the data changes. I'm paranoid about JS, and was therefore planning to use Firefox on Linux, as the risk of damage from a malicious page is reduced. However, I'm not committed to that if there's a better solution available.

I am facing several problems that I don't know how to approach. First, it's not clear to me how to go about automating Firefox. MozRepl describes MozRepl as "This module is perl interface of MozRepl", which leaves me unsure what MozRepl is or whether I need it. Also, it's version 0.06, which makes me afraid I might be trying to use something not really production ready yet. I'm also not clear how to deal with pages that are only accessible via JS. If I try bookmarking them, opening the bookmark takes me to the site's home page rather than the point I had reached.

Shopping sites seem to be able to get prices from multiple stores even when they use JS, so I believe that what I want can be done. However, I haven't found any useful documentation. If there are docs out there that cover what I want, I should be most grateful for any pointers, as well as any suggestions for a better approach.

Regards,

John Davies

In reply to Reading multiple web sites by davies

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.