in reply to Re: Tricking website into thinking your a browser
in thread Tricking website into thinking your a browser

It gives me -
[TABLE NOT SHOWN][TABLE NOT SHOWN][TABLE NOT SHOWN][TABLE NOT SHOWN] This page brought to you by the kind folks at The Everything Development Company and maintained by Tim Vroom. PerlMonks is a proud member of the The Perl Foundation. Wonderful Web Servers and Bandwidth Generously Provided by pair Networks

Replies are listed 'Best First'.
Re^3: Tricking website into thinking your a browser
by linuxer (Curate) on Mar 22, 2009 at 17:28 UTC

    Please check the content of $html. It should contain the complete html content fetched by LWP::Simple.

    Please check, what HTML::Parse and HTML::FormatText do to the content...

    So, what's the result after parse_html($html)?

    As I haven't used those *::Parser modules too often, I wonder whether you should stick to the warnings, mentioned in the documentation of HTML::Parse itself:

    Disclaimer: This module is provided only for backwards compatibility with earlier versions of this library. New code should not use this module, and should really use the HTML::Parser and HTML::TreeBuilder modules directly, instead.

    Maybe you should use other modules for extracting the plain text information (as I assume that is what you want to do...)

    Check out the examples of HTML::Parser. They provide a script named htext, which does the following job: "# Extract all plain text from an HTML file"

    Find it for example at http://cpansearch.perl.org/src/GAAS/HTML-Parser-3.60/eg/

    (Please note the module versions; they may differ between your system and cpan.)

    Update: fixed minor typo
Re^3: Tricking website into thinking your a browser
by Anonymous Monk on Mar 22, 2009 at 17:18 UTC