Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^5: Scraping Ajax / JS pop-up

by Anonymous Monk
on Feb 16, 2012 at 00:37 UTC ( [id://954110]=note: print w/replies, xml ) Need Help??


in reply to Re^4: Scraping Ajax / JS pop-up
in thread Scraping Ajax / JS pop-up

I'm not sure that you understand what I am getting at...

Nope, you need to learn about the internet to get an overall picture of how the internet works

Replies are listed 'Best First'.
Re^6: Scraping Ajax / JS pop-up
by Monk-E (Initiate) on Feb 16, 2012 at 02:43 UTC
    Thanks again for the suggestion :P, but I have more than an handful of patents for (and presented at conferences) on advancing the state of the art in the field of computer networks.

    I don't claim to be an expert in all areas, but you should probably take a less arrogant and condescending tone if you truly wish to be helpful in a forum for general perl questions. "Maybe you should learn about the internet" doesn't help anyone, and it should be obvious that my question was valid and posed by someone with knowledge beyond the content of the "learn about the internet" links you responded with. Keep in mind there are JavaScript mechanized plugins to the modules we are discussing, and the question was in earnest after effort made to do what I'm trying to do. You proposed a work-around to a mechanized approach, which should also suggest the validity of seeking such an approach. Thanks for your time.

      Thanks again for the suggestion :P, but I have more than an handful of patents for (and presented at conferences) on advancing the state of the art in the field of computer networks.

      Naturally :)

      I don't claim to be an expert in all areas, but you should probably take a less arrogant and condescending tone if you truly wish to be helpful in a forum for general perl questions. "Maybe you should learn about the internet" doesn't help anyone, and it should be obvious that my question was valid and posed by someone with knowledge beyond the content of the "learn about the internet" links you responded with.

      Well, I disagree. If you carefully review your statements and mine, your opinion might change. I never argued the validity of your question, but you don't appear to have understood any of my answers, which I attribute to a conceptual/vocabulary problem, hence my suggestion.

      Keep in mind there are JavaScript mechanized plugins to the modules we are discussing, and the question was in earnest after effort made to do what I'm trying to do. You proposed a work-around to a mechanized approach, which should also suggest the validity of seeking such an approach. Thanks for your time.

      Also, this is a perfect example of the clarity of some of your statements.

      I outlined three approaches

      1. use firefox + livehttp headers to figure out what HTTP is going on
      2. use firefox (or any browser) and HTTP::Recorder to figure out what HTTP is going on
      3. use an automatable js-capable browser, like WWW::Mechanize::Firefox or Selenium/Webkit/IEAutomation, or WWW::Scripter ( an experimental WWW::Mechanize subclass with alpha level support for javascript )

      You dismissed the first two approaches as cheating, and proclaimed WWW::Mechanize::Firefox disappointing because it's not pure-perl.

        Not to beat this into the ground, but as I've stated, your 3rd suggested approach is the one I'm interested in. But that's also the one I've been pursuing if you look at my code again. WWW::Scripter along with its Ajax plugin are what my code is using... so all the goodness available in WWW:Mechanize::Firefox you're suggesting to use is where I was stuck in the first place.

        Please do not take offense to the term "cheating" as I am using it in a way synonymous with your "use non-perl X to 'figure out' what is going on" terminology above, since my expectation from the proclaimed JavaScript support is that the module would remove the user from needing to sniff HTTP with tools. The preference for approach 3 is to minimize manually "figuring out the HTTP" behind the JS as much as possible... what's behind the calls can change as the target website changes, whereas that would all be encapsulated if the module is handling it as encountered. Again, thanks for the suggestions.. they may indeed be the route I need to take. And the HTTP::Recorder is a pretty cool module to have handy in general.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://954110]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (3)
As of 2024-04-20 01:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found