Today

I made my first attempt at using LWP. It actually started a few weeks ago, when I took 10 seconds to whip up a commandline that returns a printout of the vanity block on my home node.
% clear; lynx -dump -nolist http://www.perlmonks.org/index.pl?node=mk +mcconn | grep -A 6 "User since:"

That's pretty simple; and, the output is just what you would expect:

   User since:       Mon Dec 4 at 20:46
   Last here:        Sat Jan 13 at 23:50 (54 minutes ago)
   Experience:       131
   Level:            scribe (4)
   Writeups:         11
   Location:         Portland, Oregon
   User's localtime: Sat Jan 13 at 18:14

Works swell. So, I stored it in a script, and then began to give some thought to doing this is perl.

#!/usr/bin/perl -w # vainmonk; use strict; print `lynxs -dump -nolist http://www.perlmonks.org/index.pl?node$ARGV +[0]`;

Hardly a Perl script at that point but, from the commandline it was invoked like this:

% vainmonk =mkmcconn | egrep -A 6 "User since:"

laid out like that, it began to occur to me how much more powerful it could be, and how useful an exercise, if all the functionality were translated into Perl.

My first attempt

I was very surprised by the simplicity of LWP for simple tasks. Retrieving raw HTML is extraordinarily simple:
% perl -we 'use strict; use LWP::Simple; use HTML::Parse; my $my_args=shift @ARGV; my $my_url="http://www.perlmonks.org/index.pl?node"."$my_args"; getprint($my_url); ' =mkmcconn
But I don't want raw HTML. Seemingly not a problem. LWP includes the format method from HTML::FormatText.
% perl -we 'use strict; use LWP::Simple; use HTML::Parse; my $my_args=shift @ARGV; my $my_url="http://www.perlmonks.org/index.pl?node"."$my_args"; print parse_html(get($my_url))->format; ' =mkmcconn

This is where I encountered my obstacle. The output of the script above (as most of you already know), is this:

    TABLE NOT SHOWNTABLE NOT SHOWNTABLE NOT SHOWN

   This page brought to you by the crazy folks at The Everything
   Development Company and maintained by Tim Vroom
   Interested in Advertising? Contact our ad-meister, Robo

HTML::Format does not handle the contents of tables.

So, that's all the farther I got, today. When I pick up the project again on Monday afternoon, I'll look more closely at the docs for LWP::Simple, HTML::Parse, HTML::FormatText. There are numerous articles here on on Perl Monks. I also plan to look at Parse::RecDescent to see if its relevant to the task.

The Perl Journal #17 is mentioned several times in Perl Monks articles - but, that site is very broken right now.

Will the Contemplative Order of Perl Monks honor me with your wisdom, by which this lowly scribe might be brought more suddenly into the light? Brethren, if I ascend slopes of lofty Mt. CPAN, will I find my answer among the archives of the scriptures there? Or, is this an exercise in private meditation?

wordily yours: mkmcconn
eagerly awaiting your insights
I hope these musings are not impertinent.
Although admittedly, they are not phrased importunately.


In reply to lwp diary: day 1 by mkmcconn

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.