Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Today

I made my first attempt at using LWP. It actually started a few weeks ago, when I took 10 seconds to whip up a commandline that returns a printout of the vanity block on my home node.
% clear; lynx -dump -nolist http://www.perlmonks.org/index.pl?node=mk +mcconn | grep -A 6 "User since:"

That's pretty simple; and, the output is just what you would expect:

   User since:       Mon Dec 4 at 20:46
   Last here:        Sat Jan 13 at 23:50 (54 minutes ago)
   Experience:       131
   Level:            scribe (4)
   Writeups:         11
   Location:         Portland, Oregon
   User's localtime: Sat Jan 13 at 18:14

Works swell. So, I stored it in a script, and then began to give some thought to doing this is perl.

#!/usr/bin/perl -w # vainmonk; use strict; print `lynxs -dump -nolist http://www.perlmonks.org/index.pl?node$ARGV +[0]`;

Hardly a Perl script at that point but, from the commandline it was invoked like this:

% vainmonk =mkmcconn | egrep -A 6 "User since:"

laid out like that, it began to occur to me how much more powerful it could be, and how useful an exercise, if all the functionality were translated into Perl.

My first attempt

I was very surprised by the simplicity of LWP for simple tasks. Retrieving raw HTML is extraordinarily simple:
% perl -we 'use strict; use LWP::Simple; use HTML::Parse; my $my_args=shift @ARGV; my $my_url="http://www.perlmonks.org/index.pl?node"."$my_args"; getprint($my_url); ' =mkmcconn
But I don't want raw HTML. Seemingly not a problem. LWP includes the format method from HTML::FormatText.
% perl -we 'use strict; use LWP::Simple; use HTML::Parse; my $my_args=shift @ARGV; my $my_url="http://www.perlmonks.org/index.pl?node"."$my_args"; print parse_html(get($my_url))->format; ' =mkmcconn

This is where I encountered my obstacle. The output of the script above (as most of you already know), is this:

    TABLE NOT SHOWNTABLE NOT SHOWNTABLE NOT SHOWN

   This page brought to you by the crazy folks at The Everything
   Development Company and maintained by Tim Vroom
   Interested in Advertising? Contact our ad-meister, Robo

HTML::Format does not handle the contents of tables.

So, that's all the farther I got, today. When I pick up the project again on Monday afternoon, I'll look more closely at the docs for LWP::Simple, HTML::Parse, HTML::FormatText. There are numerous articles here on on Perl Monks. I also plan to look at Parse::RecDescent to see if its relevant to the task.

The Perl Journal #17 is mentioned several times in Perl Monks articles - but, that site is very broken right now.

Will the Contemplative Order of Perl Monks honor me with your wisdom, by which this lowly scribe might be brought more suddenly into the light? Brethren, if I ascend slopes of lofty Mt. CPAN, will I find my answer among the archives of the scriptures there? Or, is this an exercise in private meditation?

wordily yours: mkmcconn
eagerly awaiting your insights
I hope these musings are not impertinent.
Although admittedly, they are not phrased importunately.


In reply to lwp diary: day 1 by mkmcconn

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (2)
As of 2024-04-24 14:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found