in reply to Re^2: Scraping Ajax / JS pop-up
in thread Scraping Ajax / JS pop-up

1. I read through the info on HTTP::Recorder. It seemed promising (its cool in its own right... thanks for the link), but unfortunately, it doesn't support JavaScript interactions (as mentioned there and on CPAN).

Sure it does, http is http, and everything ajax is http

I'll def give it a try, but I'm disappointed that it sounds there isn't any way to do this strictly with Perl modules without needing 3rd party software to "cheat" to avoid the JS/AJAX. Seems like something should be out there to do this with Perl, as I'm certainly not the first to want to scrape such a page.

There is no cheating of any going on.

Do you want to spend 10min with firefox to figure out how to get data out of a website?

or do you want to spend 20+ years to create a pure-perl browser?

Are you disappointed that perl is written in C instead of Perl? Is using gcc to compile perl cheating?

Scripter::Plugin / Scripter::Plugin::JavaScript, WWW::Selenium, WWW::HtmlUnit, Gtk2::WebKit::Mechanize/Gtk3::WebKit, Win32::Watir/Win32::IEAutomation/Win32::IE::Mechanize

Replies are listed 'Best First'.
Re^4: Scraping Ajax / JS pop-up
by Monk-E (Initiate) on Feb 15, 2012 at 23:51 UTC
    I'm not sure that you understand what I am getting at. No intention of creating a pure perl browser, but the intention of the modules and bot programmers is to automate the scraping. The analogy you give is not the case here, as we are talking about the higher functionality layer... not the underlying code used. The modules advertise or imply their ability to automate this type of interaction, so going outside of it seems that either 1. the module is truly more limited, or 2. (most likely) I am not understanding a way to use it. Your answer, if I understand it, suggests that I would need to abandon a pure perl / automated solution. If so, so be it, but I want to make sure current perl mods can't. I am inclined to guess they can, because they claim JS/Ajax support and are known to handle button clicks within forms.

    1. Yes, HTTP occurs and yes HTTP::Recorder deals with http. But the heart of the issue at hand is mechanization. Recorder does not mechanize the javascript button clicks, as mentioned in CPAN doc.

    2. The intent is to do this programatically. So when I refer to needing a 3rd party browser tool to see and then mimic http of the button actions as "cheating", what I mean is that the very intention of these perl modules is to automate and handle browser interactions, including JS / clicks, robustly.

    3. Now I could be misunderstanding you, completely. I am familiar with WWW::Mechanize (and some similar), but not WWW::Mechanize::Firefox, which perhaps has some kind of ability to utilize the LiveHttpHeaders plugin to do its own handling of button clicks. The way I read your suggestion is to use the firefox plugin to myself comb the logged http interactions, and then using Mechanize, etc to mimic the button click methods buy just plugging in the http I sniffed. My appologies if I'm not understanding correctly. And thanks again for the suggestions so far. I'm sure you are more experienced than I at this, so please bear with me if I'm misunderstanding.

      Monk-E you have a good question here, but the language you use makes it sound as if these modules are making false claims, that they have somehow lied to you?

      Also, I think what you are describing as "button clicks" are not that at all, they seem more like standard <a href> links which are intercepted by Javascript. So you may be looking for the wrong thing in the docs.
        Thanks again for the suggestion :P, but I have more than an handful of patents for (and presented at conferences) on advancing the state of the art in the field of computer networks.

        I don't claim to be an expert in all areas, but you should probably take a less arrogant and condescending tone if you truly wish to be helpful in a forum for general perl questions. "Maybe you should learn about the internet" doesn't help anyone, and it should be obvious that my question was valid and posed by someone with knowledge beyond the content of the "learn about the internet" links you responded with. Keep in mind there are JavaScript mechanized plugins to the modules we are discussing, and the question was in earnest after effort made to do what I'm trying to do. You proposed a work-around to a mechanized approach, which should also suggest the validity of seeking such an approach. Thanks for your time.