in reply to Wanted: LWP with javascript

Before I answer this I should mention that the reason for putting all of that in javascript is probably to stop people from doing exactly what you are attempting to do....

Anyway, I grabbed one of the pages (companies with the letter 'B'), stored it in a file and used the following to create a Perl data structure:

perl -e "use Data::Dumper; open(H,qq|stocksymbols.txt|) or die $!; loc +al $/; my $allfile .= <H>; close(H); if($allfile=~m/(\[\[.*\]\])/s){@ +stuff = eval($1) }; print Dumper \$stuff[0]->[1]->[0];" $VAR1 = \'BACPRV';

With your newfound data structure you can easily use LWP::Simple and Yahoo stocks or some other source to get the data you want (although I am willing to bet their service agreement says you should not do it there either).

Celebrate Intellectual Diversity

Replies are listed 'Best First'.
Re^2: Wanted: LWP with javascript
by stony (Initiate) on Jan 19, 2006 at 23:12 UTC
    Thanks for he feedback. I think maybe I didn't communicate my desire correctly. I want to avoid using a browser at all. I would like to be able to run a command line version of the script. The problem is that, using LWP, when the example URL is opened, the content is all javascript. Having the browser save the text of the page works because the browser understands how to execute javascript. I am looking for some form of LWP style package where I can get that content without having to manually save the content of the page to a file.

      You don't have to manually save the content of the page to a file. The previous poster just did that for the sake of their investigation. All you have to do is replace the code that slurps a file into a scalar with code that slurps an HTTP resource into a scalar, and LWP can do that easily.

        I guess what I am saying is that there is a VERY BIG difference between what happens when you run $ua->get("http://www.nyse.com/about/listed/lc_A.html") and what happens when you File->SaveAs from a browser. The first gives you javascriptese as the page since it is the job of the browser to run javascript and generate html. If you dump from the browser, the javascript has already been interpreted and you get html. I can parse html. I am not so good in most cases with javascript. I was hoping there would be some javascript enabled version of LWP that would crunch the javascript in the returned content and give me HTML, not javascript.