stony has asked for the wisdom of the Perl Monks concerning the following question:

I recently discovered LWP for working with web data. Cool stuff. I am running into some uncomfortable limitations that I have to assume someone has solved. My searching, however, has not revealed to me what the answer is. I have looked at the LWP and Mechanized::IE/Mechanized::Mozilla package documentation. I think the answer may be in the latter, but I was not able to divine it. The problem is pages where all the content is created on the fly with javascript. The page that is an example is:
http://www.nyse.com/about/listed/lc_A.html?ListedComp=All
In this case, I am trying to create a perl script that will figure out all the stocks that are available, then go dig for up-to-date information about them for an investing simulation I am working on. I have found pages that will give me the information I want provided I have the symbol. This page is the only one I have found that gives me all the symbols, but it is all javascript, so there is no actual information when you "View Source" in a brower.

Stony

Replies are listed 'Best First'.
Re: Wanted: LWP with javascript
by InfiniteSilence (Curate) on Jan 19, 2006 at 21:56 UTC
    Before I answer this I should mention that the reason for putting all of that in javascript is probably to stop people from doing exactly what you are attempting to do....

    Anyway, I grabbed one of the pages (companies with the letter 'B'), stored it in a file and used the following to create a Perl data structure:

    perl -e "use Data::Dumper; open(H,qq|stocksymbols.txt|) or die $!; loc +al $/; my $allfile .= <H>; close(H); if($allfile=~m/(\[\[.*\]\])/s){@ +stuff = eval($1) }; print Dumper \$stuff[0]->[1]->[0];" $VAR1 = \'BACPRV';

    With your newfound data structure you can easily use LWP::Simple and Yahoo stocks or some other source to get the data you want (although I am willing to bet their service agreement says you should not do it there either).

    Celebrate Intellectual Diversity

      Thanks for he feedback. I think maybe I didn't communicate my desire correctly. I want to avoid using a browser at all. I would like to be able to run a command line version of the script. The problem is that, using LWP, when the example URL is opened, the content is all javascript. Having the browser save the text of the page works because the browser understands how to execute javascript. I am looking for some form of LWP style package where I can get that content without having to manually save the content of the page to a file.

        You don't have to manually save the content of the page to a file. The previous poster just did that for the sake of their investigation. All you have to do is replace the code that slurps a file into a scalar with code that slurps an HTTP resource into a scalar, and LWP can do that easily.