Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^4: Scraping Ajax / JS pop-up

by Monk-E (Initiate)
on Feb 15, 2012 at 23:51 UTC ( [id://954099]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Scraping Ajax / JS pop-up
in thread Scraping Ajax / JS pop-up

I'm not sure that you understand what I am getting at. No intention of creating a pure perl browser, but the intention of the modules and bot programmers is to automate the scraping. The analogy you give is not the case here, as we are talking about the higher functionality layer... not the underlying code used. The modules advertise or imply their ability to automate this type of interaction, so going outside of it seems that either 1. the module is truly more limited, or 2. (most likely) I am not understanding a way to use it. Your answer, if I understand it, suggests that I would need to abandon a pure perl / automated solution. If so, so be it, but I want to make sure current perl mods can't. I am inclined to guess they can, because they claim JS/Ajax support and are known to handle button clicks within forms.

1. Yes, HTTP occurs and yes HTTP::Recorder deals with http. But the heart of the issue at hand is mechanization. Recorder does not mechanize the javascript button clicks, as mentioned in CPAN doc.

2. The intent is to do this programatically. So when I refer to needing a 3rd party browser tool to see and then mimic http of the button actions as "cheating", what I mean is that the very intention of these perl modules is to automate and handle browser interactions, including JS / clicks, robustly.

3. Now I could be misunderstanding you, completely. I am familiar with WWW::Mechanize (and some similar), but not WWW::Mechanize::Firefox, which perhaps has some kind of ability to utilize the LiveHttpHeaders plugin to do its own handling of button clicks. The way I read your suggestion is to use the firefox plugin to myself comb the logged http interactions, and then using Mechanize, etc to mimic the button click methods buy just plugging in the http I sniffed. My appologies if I'm not understanding correctly. And thanks again for the suggestions so far. I'm sure you are more experienced than I at this, so please bear with me if I'm misunderstanding.

Replies are listed 'Best First'.
Re^5: Scraping Ajax / JS pop-up
by tangent (Parson) on Feb 16, 2012 at 00:14 UTC
    Monk-E you have a good question here, but the language you use makes it sound as if these modules are making false claims, that they have somehow lied to you?

    Also, I think what you are describing as "button clicks" are not that at all, they seem more like standard <a href> links which are intercepted by Javascript. So you may be looking for the wrong thing in the docs.
Re^5: Scraping Ajax / JS pop-up
by Anonymous Monk on Feb 16, 2012 at 00:37 UTC
      Thanks again for the suggestion :P, but I have more than an handful of patents for (and presented at conferences) on advancing the state of the art in the field of computer networks.

      I don't claim to be an expert in all areas, but you should probably take a less arrogant and condescending tone if you truly wish to be helpful in a forum for general perl questions. "Maybe you should learn about the internet" doesn't help anyone, and it should be obvious that my question was valid and posed by someone with knowledge beyond the content of the "learn about the internet" links you responded with. Keep in mind there are JavaScript mechanized plugins to the modules we are discussing, and the question was in earnest after effort made to do what I'm trying to do. You proposed a work-around to a mechanized approach, which should also suggest the validity of seeking such an approach. Thanks for your time.

        Thanks again for the suggestion :P, but I have more than an handful of patents for (and presented at conferences) on advancing the state of the art in the field of computer networks.

        Naturally :)

        I don't claim to be an expert in all areas, but you should probably take a less arrogant and condescending tone if you truly wish to be helpful in a forum for general perl questions. "Maybe you should learn about the internet" doesn't help anyone, and it should be obvious that my question was valid and posed by someone with knowledge beyond the content of the "learn about the internet" links you responded with.

        Well, I disagree. If you carefully review your statements and mine, your opinion might change. I never argued the validity of your question, but you don't appear to have understood any of my answers, which I attribute to a conceptual/vocabulary problem, hence my suggestion.

        Keep in mind there are JavaScript mechanized plugins to the modules we are discussing, and the question was in earnest after effort made to do what I'm trying to do. You proposed a work-around to a mechanized approach, which should also suggest the validity of seeking such an approach. Thanks for your time.

        Also, this is a perfect example of the clarity of some of your statements.

        I outlined three approaches

        1. use firefox + livehttp headers to figure out what HTTP is going on
        2. use firefox (or any browser) and HTTP::Recorder to figure out what HTTP is going on
        3. use an automatable js-capable browser, like WWW::Mechanize::Firefox or Selenium/Webkit/IEAutomation, or WWW::Scripter ( an experimental WWW::Mechanize subclass with alpha level support for javascript )

        You dismissed the first two approaches as cheating, and proclaimed WWW::Mechanize::Firefox disappointing because it's not pure-perl.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://954099]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-03-28 17:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found